A causal model is not a better prediction. It is a different kind of object: a formal representation of how a system works, built from knowledge, that can answer questions no amount of data alone can reach.

Data tells you what is correlated. It cannot tell you which direction causation runs, whether a third variable is driving both, or what would happen if you intervened. Two variables can be perfectly correlated whether A causes B, B causes A, or both are caused by a third variable entirely. The data looks identical in all three cases.

This is not a limitation of current statistical methods — it is a mathematical impossibility. The same joint probability distribution over a set of variables is consistent with multiple causal structures. Deciding which one is correct requires someone who understands the mechanism: a domain expert who knows not just that two variables are associated, but why they are.

That is the gap. The questions boards actually ask — did this control work? would a different decision have produced a different outcome? what would happen to this specific case if we changed this variable? — all require causal structure. No correlation-based tool can answer them, regardless of how much data it has access to.

The most sophisticated version of the observational-statistical apparatus — copulas, extreme value theory, multivariate GARCH, coherent risk measures — is developed in McNeil, Frey & Embrechts, Quantitative Risk Management (Princeton, 2005). None of it can answer “what would happen if we intervened?” That requires a different class of model.

Dialog: Can't AI learn the causal structure from the data itself?
Norman Fenton makes this argument directly — that machine learning from data alone cannot resolve which causal structure underlies an observed correlation, and that the gap is mathematical, not technical.
When machine learning from data is doomed to fail: causation without correlation
Professor Norman Fenton · YouTube

LLMs are not causal models

LLMs generate text about cause and effect fluently — but generating text that sounds like causal reasoning is not the same as computing causal quantities. An LLM that tells you "implementing this control will reduce risk by 40%" is producing text shaped like a causal answer. It is not running the calculation. The number has no formal meaning, cannot be verified, and cannot be audited.

The correct architecture is to use an LLM as the interface to a causal model — translating plain-language questions into formal queries and returning computed results in plain language. Not to replace a causal model with an LLM.

Explainability tools are not causal models

SHAP values, LIME, and saliency maps tell you which input features a predictive model used to produce a prediction. They do not produce causal answers — because the underlying model they are explaining is itself a correlation model. Explaining a correlation is still a correlation. A feature importance score tells you what the model weighted. It does not tell you what caused the outcome, or what would happen if you intervened.

The causal graph
A map of which variables cause which others — not which variables are correlated. Drawn by domain experts, not inferred from data. Every arrow is a causal claim the expert is willing to defend: this variable causes that one, through this mechanism. The direction of an arrow cannot be determined statistically. The expert must decide.
Structural equations
For each variable, a rule that defines how it is produced from its causes. The equations encode mechanism — not association. They are what make intervention and counterfactual queries computable: setting a variable directly (intervention) overrides its equation; reasoning about what a specific individual would have experienced under different conditions (counterfactual) holds their individual background fixed while changing one input.
Probability tables
For each variable, a probability distribution over its values given the values of its causes. These quantify the strength of each causal relationship. They can be estimated from data, elicited from experts, or both — and updated as new evidence arrives. The graph tells you what causes what. The tables tell you by how much.

Humans reason causally by default. A domain expert who has spent twenty years in insurance underwriting does not describe correlations — they describe mechanisms. They know that flood proximity causes claim frequency, not merely that the two are associated. They know which direction the arrow runs, and why. That knowledge is exactly what the causal graph encodes.

The process is called knowledge elicitation. Three things make it harder than it sounds:

Arrow direction
An arrow from A to B is a causal claim: A causes B — not that they are correlated, but that changing A would change B. Two perfectly correlated variables could be causally related in either direction, or both could be driven by a third variable entirely. The data looks identical in all three cases. Getting the direction wrong produces a model that is formally wrong regardless of how well it fits the historical record.
Confounders
A confounder is a variable that causes both the input and the output, creating a spurious association. If the expert omits it from the graph, every inference from the model is wrong in a predictable direction. Identifying confounders requires domain knowledge that is not in the data.
Expert disagreement
When two experts draw different causal graphs, the disagreement is itself information — it identifies precisely where causal knowledge is uncertain and where gathering more evidence would change the model. A well-run elicitation surfaces the disagreement and encodes it honestly rather than averaging it away.

The result is a graph that domain experts can read, interrogate, and defend — because every arrow in it represents a claim they made. This is what makes causal models auditable in a way that trained models are not: the structure is its own documentation.

Next Step

Building a library of Structural Causal Models (SCMs) from your domain experts’ knowledge — and making it queryable in plain language using an LLM — is an engagement.

The Four Phases →

Pearl, J., 2009, Causality: Models, Reasoning, and Inference (2nd ed.), Cambridge University Press · Pearl, J. & Mackenzie, D., 2018, The Book of Why, Basic Books · Peters, J., Janzing, D. & Schölkopf, B., 2017, Elements of Causal Inference, MIT Press