Why Probabilistic AI Fails (in High-Stakes Work)¶
Failure mechanics
Plausibility is not epistemic validity.
Next-token prediction is a powerful compression engine. In high-stakes work, its core risk is not “inaccuracy” — it’s unverifiable confidence.
The illusion¶
LLMs are excellent at generating text that resembles correct answers. But resemblance is not the same as truth.
In practice, fluency can mask missing sources, missing constraints, and missing causal structure.
Why RAG helps — and why it still fails¶
Causal questions
“Why did X happen?” requires mechanisms and context, not just relevant passages.
Exceptions and footnotes
Policies and regulations live in edge cases. Retrieval often misses the clause that flips the decision.
Cross-document constraints
“This is allowed only if A and B and not C” is a constraint problem. Text similarity doesn’t enforce it.
What changes with glass-box systems¶
Traceable path
The system shows the reasoning path it took — not just a final answer.
Explicit sources
Every claim has provenance (where it came from, why it was selected).
Enforced constraints
Constraints are gates. If a constraint fails, the system refuses or escalates.
If the system can’t provide path + sources + constraints, it must abstain. This is not a UX preference — it’s an architectural constraint.
Diagram: plausible text vs decision-grade pipeline¶
flowchart TB
%% Styles (brModel Standard)
classDef i fill:#D3D3D3,stroke-width:0px,color:#000;
classDef p fill:#B3D9FF,stroke-width:0px,color:#000;
classDef r fill:#FFFFB3,stroke-width:0px,color:#000;
classDef o fill:#C1F0C1,stroke-width:0px,color:#000;
classDef s fill:#FFB3B3,stroke-width:0px,color:#000;
S_User("👤 User"):::s
I_Req(["📥 Request / decision context"]):::i
P_LLM("🧠 LLM generates"):::p
R_Text(["📝 Plausible text"]):::r
O_Risk(["⚠️ Risk: confident fabrication (missing evidence + missing constraints)"]):::o
P_Retrieve("🧭 Retrieve evidence"):::p
R_Evidence(["🔎 Evidence set (sources + provenance)"]):::r
P_Validate("🔒 Validate constraints"):::p
G_OK{"Valid?"}:::s
R_Trace(["🧾 Trace log (what/why/source)"]):::r
O_Decision(["✅ Decision-grade output (answer + audit trail)"]):::o
O_Refuse(["🛑 Refuse / escalate (no guessing)"]):::o
S_User --> I_Req
I_Req --> P_LLM --> R_Text --> O_Risk
I_Req --> P_Retrieve --> R_Evidence --> P_Validate --> G_OK
G_OK -->|"yes"| R_Trace --> O_Decision
G_OK -->|"no"| O_Refuse
%% Clickable nodes
click P_Retrieve "/methodology/llm-tool-rag/" "LLM + Tool + RAG"
click P_Validate "/methodology/constraints/" "Constraints & SHACL"
click R_Trace "/reasoners/governance/" "Governance"
⚠️ This diagram contrasts plausible text with a decision-grade pipeline: retrieval → constraint validation → trace → output, with refusal as the safe default when validity fails.
Diagram: where RAG fails¶
flowchart TB
%% Styles (brModel Standard)
classDef i fill:#D3D3D3,stroke-width:0px,color:#000;
classDef p fill:#B3D9FF,stroke-width:0px,color:#000;
classDef r fill:#FFFFB3,stroke-width:0px,color:#000;
classDef o fill:#C1F0C1,stroke-width:0px,color:#000;
classDef s fill:#FFB3B3,stroke-width:0px,color:#000;
I_Query(["📥 Question"]):::i
P_Retrieve("🔎 Retrieve top-k chunks"):::p
R_Snips(["📄 Selected snippets"]):::r
P_Synth("🧠 LLM synthesizes"):::p
O_Text(["📝 Output text"]):::o
I_Edge(["📌 Edge-case clause (often not retrieved)"]):::i
I_Cross(["🔗 Cross-document constraint (A and B and not C)"]):::i
I_Mech(["⚙️ Mechanism / causal model (not guaranteed)"]):::i
P_Fix("🧱 Add structure"):::p
R_Model(["🧠 Domain model + constraints (ground truth structure)"]):::r
O_Glass(["✅ Glass-box output (traceable + governed)"]):::o
I_Query --> P_Retrieve --> R_Snips --> P_Synth --> O_Text
I_Edge -. "missing" .-> R_Snips
I_Cross -. "not enforced" .-> P_Synth
I_Mech -. "not represented" .-> P_Synth
O_Text -. "risk" .-> P_Fix --> R_Model --> O_Glass
%% Clickable nodes
click R_Model "/methodology/constraints/" "Constraints & SHACL"
click O_Glass "/reasoners/governance/" "Governance"
📌 This diagram highlights why naive RAG breaks: it can miss edge clauses, fail to enforce cross-document constraints, and omit mechanisms — all of which structure fixes.