Manufacturing: Root-Cause Analysis Across the Supply Chain¶
Case study β manufacturing
Quality failures are causal systems, not isolated defects.
Manufacturing issues rarely have one cause. They propagate through process steps, tooling, suppliers, and environmental conditions. We make those chains explicit and auditable.
The question¶
How do we identify root causes of quality failures when evidence spans sensors, process logs, maintenance events, and supplier batches β and decisions must be justified?
Failure modes to avoid¶
Correlation traps
Spurious correlations appear in high-dimensional sensor data.
Missing context
Process step dependencies and maintenance history are often disconnected.
Non-reproducible investigations
Root-cause analysis becomes tribal knowledge without traces.
Unsafe actions
Line stops, recalls, and supplier blocks must be governed and reviewed.
Batch confounding
Supplier lots, shifts, and ambient conditions can confound signals unless modeled explicitly.
Measurement drift
Sensor calibration and threshold changes can look like process change without a provenance trail.
What changes with causal chains¶
flowchart TB
%% Styles (brModel Standard)
classDef i fill:#D3D3D3,stroke-width:0px,color:#000;
classDef p fill:#B3D9FF,stroke-width:0px,color:#000;
classDef r fill:#FFFFB3,stroke-width:0px,color:#000;
classDef o fill:#C1F0C1,stroke-width:0px,color:#000;
classDef s fill:#FFB3B3,stroke-width:0px,color:#000;
I_B(["π¦ Batch / lot"]):::i
R_Sup(["π Supplier COA + lot history"]):::r
R_Proc(["π Process recipe + setpoints"]):::r
R_Tool(["π Tool calibration + maintenance"]):::r
R_Env(["π Environment<br>(temp, humidity)"]):::r
R_Sens(["π Sensor telemetry"]):::r
P_Prov("π§Ύ Provenance + time alignment"):::p
R_EB(["π Evidence bundle<br>(joined by batch/time)"]):::r
P_CG("πΈοΈ Build causal graph"):::p
R_CG(["πΈοΈ Process causal graph<br>(steps, tools, suppliers)"]):::r
P_Anom("π Detect anomalies"):::p
R_Q(["π Quality signals<br>(yield, defects)"]):::r
G_Drift{"Drift detected?"}:::s
G_Conf{"Confounders controlled?"}:::s
S_F(["β οΈ Failure / deviation"]):::i
O_R(["β
Root-cause candidates<br>(with evidence per link)"]):::o
R_Tr(["π§Ύ Trace object<br>(batch β signals β causes)"]):::r
I_B --> P_Prov
R_Sup --> P_Prov
R_Proc --> P_Prov
R_Tool --> P_Prov
R_Env --> P_Prov
R_Sens --> P_Prov
P_Prov --> R_EB --> P_CG --> R_CG --> P_Anom --> R_Q --> G_Drift
G_Drift -->|"yes"| S_F --> G_Conf
G_Drift -->|"no"| G_Conf
G_Conf -->|"yes"| O_R --> R_Tr
G_Conf -->|"no"| S_F --> R_Tr
π The mechanism is multi-source: supplier lots, recipes, tooling, environment, and telemetry are merged into an evidence bundle, then turned into a causal graph. Drift and confounders become explicit gates; the output is root-cause candidates with evidence per link β packaged as a trace object.
Diagram: governed RCA workflow¶
flowchart TB
%% Styles (brModel Standard)
classDef i fill:#D3D3D3,stroke-width:0px,color:#000;
classDef p fill:#B3D9FF,stroke-width:0px,color:#000;
classDef r fill:#FFFFB3,stroke-width:0px,color:#000;
classDef o fill:#C1F0C1,stroke-width:0px,color:#000;
classDef s fill:#FFB3B3,stroke-width:0px,color:#000;
I_Inc(["β οΈ Incident"]):::i
P_E("π Collect evidence"):::p
R_Src(["π Sources<br>(MES, SCADA, CMMS, supplier)"]):::r
P_Prov("π§Ύ Validate provenance"):::p
G_Prov{"Provenance ok?"}:::s
P_Path("π§ Generate causal paths"):::p
R_Path(["π§ Path candidates<br>(with assumptions)"]):::r
G_Ev{"Evidence sufficient?"}:::s
P_Conf("π§ͺ Confounder checks"):::p
G_Conf{"Confounders controlled?"}:::s
P_V("π Validate constraints + required evidence"):::p
G_OK{"Gates pass?"}:::s
O_A(["β
RCA report + recommendation"]):::o
R_Tr(["π§Ύ RCA trace bundle<br>(evidence + paths + gates)"]):::r
S_X(["π Abstain + request missing data"]):::i
I_Inc --> P_E --> R_Src --> P_Prov --> G_Prov
G_Prov -->|"no"| S_X
G_Prov -->|"yes"| P_Path --> R_Path --> G_Ev
G_Ev -->|"no"| S_X
G_Ev -->|"yes"| P_Conf --> G_Conf
G_Conf -->|"no"| S_X
G_Conf -->|"yes"| P_V --> G_OK
G_OK -->|"yes"| O_A --> R_Tr
G_OK -->|"no"| S_X --> R_Tr
%% Clickable nodes
click P_V "/methodology/constraints/" "Constraints & SHACL"
click P_Path "/methodology/causalgraphrag/" "CausalGraphRAG"
π RCA becomes reproducible when gates are explicit: provenance must hold, evidence must be sufficient, confounders must be controlled, and only then do constraints approve a recommendation. When any gate fails, the correct output is abstention with a precise missing-data request β not a forced conclusion.
Diagram: intervention approval gates (preventing unsafe fixes)¶
flowchart TB
%% Styles (brModel Standard)
classDef i fill:#D3D3D3,stroke-width:0px,color:#000;
classDef p fill:#B3D9FF,stroke-width:0px,color:#000;
classDef r fill:#FFFFB3,stroke-width:0px,color:#000;
classDef o fill:#C1F0C1,stroke-width:0px,color:#000;
classDef s fill:#FFB3B3,stroke-width:0px,color:#000;
I_Fix(["π§© Proposed intervention<br>(tooling, recipe, supplier)"]):::i
G_Impact{"High impact?"}:::s
G_Ev{"Evidence sufficient?"}:::s
G_Safe{"Safety constraints pass?"}:::s
P_Pilot("π§ͺ Pilot / sandbox test"):::p
G_Pilot{"Pilot success?"}:::s
P_RB("π§― Define rollback plan"):::p
G_Sign{"Sign-offs complete?"}:::s
O_Do(["β
Execute change"]):::o
P_Mon("π Monitor outcome"):::p
G_Reg{"Regression detected?"}:::s
S_Rev(["π Require review / sign-off"]):::i
S_Stop(["π Stop + rollback"]):::i
R_Tr(["π§Ύ Change trace bundle<br>(tests + approvals + results)"]):::r
I_Fix --> G_Impact
G_Impact -->|"yes"| G_Ev
G_Impact -->|"no"| G_Ev
G_Ev -->|"no"| S_Rev --> R_Tr
G_Ev -->|"yes"| G_Safe
G_Safe -->|"no"| S_Rev
G_Safe -->|"yes"| P_Pilot --> G_Pilot
G_Pilot -->|"no"| S_Rev
G_Pilot -->|"yes"| P_RB --> G_Sign
G_Sign -->|"no"| S_Rev
G_Sign -->|"yes"| O_Do --> P_Mon --> G_Reg
G_Reg -->|"yes"| S_Stop --> R_Tr
G_Reg -->|"no"| R_Tr
π¦ The fix is also a governed decision: impact, evidence, and safety constraints gate the intervention; then a pilot test, rollback plan, and sign-offs are required. After execution, monitoring gates whether to keep or rollback β and the full lifecycle is captured in a trace bundle.
Outputs¶
Root-cause paths
Mechanistic chains with evidence per edge and explicit assumptions.
Traceable interventions
Line adjustments, supplier actions, and mitigations tied to the trace artifact.
Faster postmortems
Investigations become repeatable and comparable over time.
Governed escalation
High-impact actions trigger review gates and mandatory sign-offs.
Supplier propagation map
How upstream batch and supplier events flow into downstream quality signals β with evidence per link.
Change-impact analysis
Before you adjust a process, the system can show which constraints, steps, and failure modes the change touches.