LLM + Tool + RAG¶
Methodology → baseline
The mainstream stack, and the exact point where it fails.
LLM + Tool + RAG is a strong starting point: retrieval reduces pure invention and tools turn text into action. But it still lacks an enforcement layer that makes violations impossible.
The baseline architecture¶
flowchart TB
%% Styles (brModel Standard)
classDef i fill:#D3D3D3,stroke-width:0px,color:#000;
classDef p fill:#B3D9FF,stroke-width:0px,color:#000;
classDef r fill:#FFFFB3,stroke-width:0px,color:#000;
classDef o fill:#C1F0C1,stroke-width:0px,color:#000;
classDef s fill:#FFB3B3,stroke-width:0px,color:#000;
I_U(["👤 User"]):::i
P_L("🧠 LLM"):::p
P_R("🔎 RAG retrieve"):::p
P_T("🧰 Tools / APIs"):::p
D_Gate{"✅ Constraint gate present?"}:::s
R_Lack(["⚠️ No hard constraints + weak trace"]):::r
O_A(["🗣️ Output (text/action proposal)"]):::o
S_Risk(["🛑 Silent violation risk"]):::i
O_Safe(["✅ Allowed output (traceable)"]):::o
I_U --> P_L
P_L --> P_R --> P_L
P_L --> P_T --> P_L
P_L --> D_Gate
D_Gate -->|"No"| R_Lack --> O_A --> S_Risk
D_Gate -->|"Yes"| O_Safe
%% Clickable nodes
click P_R "/methodology/llm-tool-rag/" "LLM + Tool + RAG"
click P_T "/methodology/llm-tool-rag/" "Tools"
Baseline mechanism: the 🧠 LLM loops over 🔎 retrieval and 🧰 tools, but whether the system is safe depends on a separate ✅ constraint gate. Without it, you can get fluent 🗣️ output with 🛑 silent violation risk.
The missing layer: constraint gate¶
LLM
A probabilistic language engine: great at synthesis and dialogue, but it does not intrinsically know what is permitted, true, or safe to execute.
Tools
Deterministic actions and APIs: they make the system do real work, but they will do the wrong thing if the plan or parameters are wrong.
RAG
Retrieval for grounding: it reduces pure invention, but retrieval returns candidates — not a verified chain of claims for this specific decision.
Why it’s insufficient: no hard rules
If constraints only live in text, the model can ignore them under pressure. High-stakes systems need non-negotiable checks outside the model.
Why it’s insufficient: weak audit trail
You can log prompts and retrieved chunks, but that is not an auditable reasoning artifact. Governance needs structured traces and provenance.
Why it’s insufficient: mismatch under change
After deployment, sources drift and policies evolve. Without validation gates, the system keeps producing fluent outputs on outdated assumptions.
Prompting is negotiable. Constraints are enforceable.
If a rule matters, it must live in a layer the model cannot “talk its way around”.
flowchart TB
%% Styles (brModel Standard)
classDef i fill:#D3D3D3,stroke-width:0px,color:#000;
classDef p fill:#B3D9FF,stroke-width:0px,color:#000;
classDef r fill:#FFFFB3,stroke-width:0px,color:#000;
classDef o fill:#C1F0C1,stroke-width:0px,color:#000;
classDef s fill:#FFB3B3,stroke-width:0px,color:#000;
I_D(["🗣️ Draft answer / proposed action"]):::i
P_V("🔒 Validate constraints (SHACL)"):::p
R_Report(["🧾 Validation report (violations or conformance)"]):::r
D_OK{"✅ Conforms?"}:::s
O_O(["✅ Output / execute + record trace"]):::o
S_X(["🛑 Abstain / escalate + return violations"]):::i
I_D --> P_V --> R_Report
R_Report --> D_OK
D_OK -->|"Yes"| O_O
D_OK -->|"No"| S_X
%% Clickable nodes
click P_V "/methodology/constraints/" "Constraints & SHACL"
click O_O "/methodology/brcausalgraphrag/" "brCausalGraphRAG"
Decision point: the system produces a 🧾 validation report, then a ✅ conforms? gate decides whether to proceed. Passing yields ✅ execute + trace; failing yields 🛑 abstain/escalate and returns violations as structured feedback.
Where it still breaks¶
Retrieval is not reasoning
RAG returns relevant text, not a valid causal path. The model can still stitch together incompatible pieces.
Policy is not “just more context”
Policies are constraints. If they only exist as text, they are bypassable and hard to audit.
No trace, no accountability
Without structured traces, you cannot reliably debug failures or identify which evidence changed the decision.
Silent uncertainty
The system can be fluent while wrong; abstention must be a designed outcome, not a polite suggestion.
Tool misuse and unsafe execution
Tool calls amplify impact. Without schema validation and policy checks, a small reasoning error becomes a real-world incident.
Inconsistent answers across runs
Different retrieval results or model versions can produce different conclusions. Without constraints and traces, you can’t guarantee stability.
What to add for decision-grade systems¶
- Enforceable constraints (not guidelines)
- Provenance-first data (claims link to sources and versions)
- Trace objects (machine-verifiable reasoning artifacts)
- Abstention + escalation (explicit failure modes)