LLM + Tool + RAG¶
Methodology → baseline
The mainstream stack, and the exact point where it fails.
LLM + Tool + RAG is a strong starting point: retrieval reduces pure invention and tools turn text into action. But it still lacks an enforcement layer that makes violations impossible.
The baseline architecture
flowchart LR;
U["User"] --> L["LLM"];
L -->|"Search / retrieve"| R["RAG"];
R --> L;
L -->|"Call tools"| T["Tools / APIs"];
T --> L;
L --> A["Answer"];
The missing layer: constraint gate
Prompting is negotiable. Constraints are enforceable.
If a rule matters, it must live in a layer the model cannot “talk its way around”.
flowchart TB;
D["Draft answer / action"] --> V["Validate constraints"];
V -->|"Pass"| O["Output / execute"];
V -->|"Fail"| X["Abstain + explain"];
Where it still breaks
Retrieval is not reasoning
RAG returns relevant text, not a valid causal path. The model can still stitch together incompatible pieces.
Policy is not “just more context”
Policies are constraints. If they only exist as text, they are bypassable and hard to audit.
No trace, no accountability
Without structured traces, you cannot reliably debug failures or identify which evidence changed the decision.
Silent uncertainty
The system can be fluent while wrong; abstention must be a designed outcome, not a polite suggestion.
What to add for decision-grade systems
- Enforceable constraints (not guidelines)
- Provenance-first data (claims link to sources and versions)
- Trace objects (machine-verifiable reasoning artifacts)
- Abstention + escalation (explicit failure modes)