Pearl's Ladder of Causality

What Validation Tests

Validation of a causal model operates at three levels. Structural validation tests whether the graph’s independence claims are consistent with the data. Parameter validation tests whether the conditional probability tables quantify the relationships correctly. Predictive validation tests whether the model’s forward inference produces calibrated probability distributions over observed outcomes.

These are three distinct tests. A model can pass predictive validation while failing structural validation — if the confounders are wrong but the prediction is still accurate for the training distribution, predictive metrics will not catch the structural error. The structural error will only manifest when the model is used for intervention queries — and then it will produce wrong answers with high confidence.

Structural Validation

The graph implies a set of conditional independence relationships — d-separation tells you which pairs of variables should be independent given which conditioning sets. These predictions are testable against data. For each implied independence, test whether the data is consistent with independence given the conditioning set. Violations are signals of missing nodes, missing arrows, or incorrect arrow directions.

Expert review is the other structural validation mechanism. For each arrow in the graph, the domain expert should be able to describe the mechanism: how does this cause produce this effect, through what process, under what conditions? An arrow whose mechanism cannot be described is a structural assumption that has not been examined.

Parameter Validation

The conditional probability tables can be validated by comparing the model’s predictions with held-out data on the same distribution. For each node, the model predicts P(X | parents(X)) — this is a standard calibration test. A well-calibrated CPT produces P(X=x | parents) = f across cases where the model assigns probability f, for all f.

Expert disagreement with model outputs is also a validation signal. If an expert consistently finds the model’s posterior probabilities implausible — the model says 12% failure probability and the expert says it should be 40% — either the CPT is miscalibrated or the graph is missing a variable the expert knows about. Both are worth investigating. See When Experts Disagree.

Drift Detection

A model that was correct at build time can become incorrect as the system it models changes. Drift detection monitors the model’s predictions against ongoing observations and flags when the predictive accuracy has degraded beyond a threshold.

The causal structure of drift is often informative: if the model’s predictions are consistently wrong in a specific subgraph — accurate elsewhere but degraded in the nodes related to one variable — the variable whose mechanism has changed is usually the one at the root of the degraded subgraph. The graph makes the location of the drift visible. Revalidation can focus on the affected subgraph rather than the entire model.

The Engagement

A causal model that has never been validated is a hypothesis. Validation makes it a model. Thirty minutes to assess which elements of your current model have been tested and which have not.

info@rung3.ai

How model validation works.

On this page

What Validation Tests

Structural Validation

Parameter Validation

Drift Detection