An LLM proposes structure but cannot ground it in your specific mechanisms. Predictive ML finds correlations but cannot tell you what would happen under intervention. Only an SCM represents what causes what, and only an SCM lets the question "what would have happened differently?" actually compute.
But an SCM does not replace the expert who knows the mechanisms. It encodes what they know — explicitly, inspectably, and in a form the organization can keep after the expert is gone. That is the move worth understanding.
Three tools, three jobs
The question "can an SCM replace a domain expert?" usually conflates three different tools doing three different jobs. The distinction is the first thing to get right, because the answer is different for each.
| System | What it learns | What it cannot reliably infer |
|---|---|---|
| Predictive ML | Correlations between inputs and outputs in the training distribution. | Causal direction. The effect of an intervention. What would have happened under a counterfactual. |
| LLMs | Patterns in language and reasoning traces written by other people about other situations. | The causal mechanisms in your specific organization, market, patient population, or supply chain. |
| SCMs | The explicit causal assumptions someone wrote down — variables, arrows, mechanisms, latent confounders, intervention semantics. | The assumptions themselves. Those still have to come from somewhere. |
The third row is the key one. An SCM is a structure for representing causal claims — what causes what, through what mechanism, under what assumptions — but the claims themselves still have to come from somewhere. In practice, they come from a domain expert. The SCM is the artifact that captures their reasoning; it is not a substitute for the reasoning.
This is why the right framing is encoding, not replacement. Predictive ML and LLMs were both pitched at points as ways to bypass the expert. An SCM is the opposite move: a way to capture what the expert knows so that the organization still has access to it after the expert leaves. The reasoning persists; the expert does not have to.
Where the hard work is
A common mistake in evaluating SCMs is to ask whether the math is hard. It usually isn't. The do-calculus is well-understood; identifiability has a textbook answer; estimation is computational. The hard work is upstream of all of that.
Specifying an SCM means deciding which nodes exist, which arrows exist, which variables are latent confounders, what time scale the model operates on, and which interventions are even feasible in the system being modeled. Those specification choices are the expertise. An SCM project that skips them and tries to discover structure from data alone runs into a classical problem — many graphs explain the same observations, and without interventional data or strong prior knowledge there is no way to tell them apart.
This is also why "SCMs replace experts" is the wrong frame even in the long run. Causal discovery from data alone is fundamentally underdetermined. Some part of the structural specification will always have to come from somewhere outside the data. The question is whether you capture that contribution explicitly — in a form your organization owns — or leave it implicit in someone's head where it cannot be reviewed, audited, or transferred.
Where SCMs do something other frameworks can't
SCMs earn their keep in domains where four conditions hold together: mechanisms are interpretable, interventions matter, data is sparse or non-experimental, and counterfactual reasoning is needed. Pricing, retention, medicine, reliability engineering, supply chains, regulatory policy — these are the domains where the question is rarely "what will happen next?" (which ML handles fine) and usually "what would happen if we did X?" or "would the outcome have been different had we done Y?" Worked case studies in each of these domains are catalogued on Recent Work.
Those second and third questions are interventional and counterfactual. They are exactly what an SCM is built to answer, and they are exactly what no purely predictive system — however sophisticated — can compute. Pearl's three rungs formalize the distinction: prediction is Rung 1, intervention is Rung 2, counterfactual is Rung 3. Most analytics tools live on Rung 1 and reach for the higher rungs by metaphor. An SCM is the object that actually computes them.
For the constructive architecture — how an LLM interface and an SCM engine compose into a system that answers questions an LLM alone cannot — see the architecture section on Why not an LLM?
Where SCMs struggle
Real systems often violate the clean assumptions an SCM rests on. Feedback loops, adaptive behavior, strategic agents, regime changes, hidden state, time-varying parameters — none of these fit a static graph cleanly. An expert reasoning about a real system carries tacit knowledge of when these conditions hold, when they don't, and what to do about it. That tacit layer is genuinely hard to formalize, and an SCM that pretends it doesn't exist will fail in deployment.
There is also a piece an SCM doesn't even try to provide: the goal. Experts optimize for a mixture of business outcomes, ethical constraints, robustness, organizational politics, and implementation cost. An SCM gives you a causal model of the world; it does not tell you what to do with it. That layer still needs a human, or at minimum an explicit utility function someone has to write down. See The work the library cannot do for the longer treatment of what remains genuinely human even after the library is built.
The credible direction is hybrid
The strongest near-term architecture is not SCM-alone. It is a stack: an LLM at the interface, an SCM at the engine, real data calibrating the parameters, and simulation evaluating proposed interventions before they happen in the world. Each component does the job it is genuinely good at.
- The LLM proposes structure, surfaces mechanisms from unstructured organizational knowledge, and translates between plain-language stakeholder questions and formal causal queries.
- The SCM provides the formal semantics for causal claims. It is the thing that actually runs the do-calculus, checks identifiability, and produces estimates with bounds and sensitivity profiles.
- Data calibrates the parameters of the structures the LLM and the experts have proposed.
- Simulation evaluates interventions and counterfactuals before any commitment is made in the world.
This stack — LLM as interface, SCM as engine, data and simulation as ground — is the first credible architecture for machine subject matter expertise in narrow, stable domains. It is not a replacement for an expert. It is a form of expert knowledge that persists after the expert is gone, can be challenged by a reviewer because the assumptions are explicit, and can answer questions the original expert was never asked.
That is a meaningful thing for an organization to have. It is not the same thing the original question was asking about.
The honest summary
SCMs do not replace domain experts. They encode what experts know, in a form the organization owns, can audit, and can query without the expert in the room. The replacement narrative was always the wrong frame. The encoding narrative is the right one — and SCMs are among the few frameworks that can do it.