The Exchange
If you have correlated but not causal data, you can only answer Rung 1 questions. But if you add causation then you’ve added value — and if the causal model has confounders, then the do() operator is distinguishable from the obs() operator. But if confounding is absent, they are not. This applies to Structural Causal Models as well.
Correct — and worth stating precisely. Three claims are bundled in the question, each independently important:
1. Correlation → Rung 1 only
With purely observational data and no causal structure, you can answer only: what tends to co-occur with what? Formally: P(Y | X). Every regression, prediction, dashboard, and heat map is the same operation run on different variables. This is Rung 1 — seeing.
2. Adding causal structure → enables Rung 2
A causal model — a DAG or SCM — encodes structural assumptions that cannot be read off the data alone. With that structure in place, the do() operator becomes meaningful: P(Y | do(X)) is the distribution of Y when X is set rather than observed. The distinction is what makes a causal model a decision tool rather than a reporting tool.
3. Confounding determines whether do() ≠ obs()
When confounders are present — when a variable Z causes both X and Y — the observational distribution is contaminated by that common cause. Formally:
P(Y | do(X)) = P(Y | X) [when no backdoor paths exist]
The backdoor criterion identifies which variables to adjust for in order to recover the interventional distribution from observational data. When all backdoor paths are blocked — either by the graph structure or by conditioning on the right set — the observational and interventional distributions coincide. But this equivalence holds only when the graph is correctly specified, selection bias is absent, and no colliders have been inadvertently conditioned on.
The point extends directly to SCMs: an SCM that has no confounders — no common causes of treatment and outcome not in the model — produces the same answer whether you query observationally or interventionally. The practical implication is that for domains where confounding is known to be absent, the additional apparatus of the do-calculus is not necessary. The caution is that domains where confounding is known to be absent are rare.
What is Goodhart’s Law?
“When a measure becomes a target, it ceases to be a good measure.”
A metric may correlate with a true objective. Once it is optimized directly, the correlation breaks — because optimization pressure changes the distribution of X in ways that were not present when the proxy was selected.
Before optimization: the proxy reliably tracks the outcome because both are driven by the same underlying causes. After optimization: the proxy is hit via a different mechanism — one that produces the proxy value without producing the underlying outcome.
Four variants
| Variant | Mechanism |
|---|---|
| Regressional | Extreme values include noise; selecting on the proxy selects partly on noise |
| Causal | Intervening on the proxy is not the same as intervening on the cause |
| Extremal | Proxy fails outside the regime in which it was validated |
| Adversarial | Agents with goals misaligned to the principal game the proxy directly |
The connection to the Ladder
Goodhart’s Law is precisely the Causal variant stated informally. Optimizing P(metric | X) — a Rung 1 operation — is not the same as optimizing P(true outcome | do(X)) — a Rung 2 operation. The metric was selected because it correlated with the outcome under natural variation. The optimization changes the distribution of X, severs the natural correlation, and the proxy ceases to track the thing it was selected to measure.
The formal statement: the proxy was a valid summary of P(Y | X). It is not a valid summary of P(Y | do(X)). Any proxy breaks under strong optimization pressure — unless the causal model confirms that intervening on the proxy and intervening on the outcome are the same operation.
Annotation
Three things this exchange illustrates that are directly relevant to how the models on this site are built:
Whether do() equals obs() is a property of the causal graph, not the data. The same dataset can support or refute equality depending on which graph is correct.
The proxy broke because the optimization was a Rung 2 operation applied to a Rung 1 summary. The model that selected the proxy was not the model that governed the intervention.
An SCM with no confounders produces the same observational and interventional answers. The SCM doesn’t eliminate Goodhart — it makes the conditions under which the proxy is valid explicit and testable.
Goodhart diagnosed the problem. Designing KPIs That Don’t Break → provides the procedure for fixing it — how to select, test, and maintain metrics that hold under optimization pressure.
The early warning indicators case in Dynamic Bayesian Networks is Goodhart in a temporal model: a threshold-based early warning indicator is a proxy for an upstream shift. Once the indicator is acted on — once it becomes a target — the relationship between the indicator and the upstream state may change. A causal early warning indicator detects the upstream shift directly, which is why it cannot be gamed in the same way.
The KPI Problem Is a Causal Problem
Goodhart’s Law is usually treated as a management lesson: be careful what you measure. In causal terms it is structural and precise — and it applies to every KPI system in every organization, whether or not anyone has noticed.
Every KPI system is implicitly making a causal claim: that intervening on the metric is the same as intervening on the outcome. Almost no KPI system ever checks whether that claim is true.
Why KPIs break
A KPI is selected because it correlates with an outcome under the natural distribution of the organization’s behavior. That correlation is a Rung 1 relationship — it held in the data before the KPI was introduced. The moment the KPI becomes a target, the organization begins intervening on it. That is a Rung 2 operation.
A Rung 2 intervention produces the same result as a Rung 1 observation only when there are no confounders — no common cause of the metric and the outcome that the intervention can bypass. KPI metrics almost always have such confounders.
| KPI | Intended outcome | Confounder | How the KPI is hit without the outcome |
|---|---|---|---|
| Customer satisfaction score | Customer retention | Service quality | Agents prompt high scores; complaint resolution inflates scores without improving service |
| Claims cycle time | Claims quality and cost | Investigation rigor | Claims closed faster by reducing investigation — cycle time improves, leakage increases |
| Security training completion | Reduced breach risk | Security awareness | Completion rates hit by clicking through; awareness unchanged, risk unchanged |
| Lines of code reviewed | Code quality | Review depth | Reviews accelerated without depth; count hits target, defect rate unchanged |
| Loss ratio improvement | Underwriting quality | Portfolio mix | Ratio improves by shedding high-risk business; quality of remaining book is unchanged or worse |
The right question when a KPI is hit but the outcome hasn’t moved
This is not a management failure or a gaming problem. It is a causal identification failure. The confounder that was driving both the metric and the outcome has been decoupled from the intervention. The metric moved because the intervention reached it directly. The outcome didn’t move because the mechanism connecting them was bypassed.
The diagnostic question is structural: what common cause was driving both the metric and the outcome in the historical data, and does the intervention reach that common cause or bypass it? That is a question about the causal graph — not about effort, intent, or target-setting rigor.
Three questions a causal model answers about any KPI
The question every KPI selection process answers. Necessary but not sufficient.
The question no KPI selection process answers. The only one that matters once the metric becomes a target.
The post-mortem question. Why hitting the KPI didn’t produce the expected result — and what would have.
A KPI is only valid as a target if the causal graph confirms that do(metric) and do(outcome) are the same operation — or that intervening on the metric necessarily causes the outcome through a mechanism that cannot be bypassed. Checking this is a Rung 2 query. It requires a causal model. It is never currently done.
If your KPI system is hitting targets while the outcomes don’t move — the causal graph already knows why. Thirty minutes to identify the confounder.
info@rung3.ai