Building a Causal Model — A Step-by-Step Counterfactual

The hardest questions in risk and decision-making are counterfactual: what would have happened to this patient, this claim, this customer — had we acted differently? Only a structured causal model can answer them. This page shows how to build one.

The Counterfactual Question

That is a counterfactual question — not a population statistic, but a specific claim about this individual. It is the most useful class of question at a board table. And it requires a structured causal model to answer.

The clinical domain is used here because the question is immediately legible. The structure is identical for any high-stakes decision under uncertainty:

Domain	The counterfactual question
Clinical	Would this patient have been hospitalised if statins had been prescribed?
Insurance	Would this loss have occurred if the risk had been underwritten differently?
Cyber	Would this breach have happened if endpoint detection had been deployed?
Credit	Would this default have occurred if the lending criteria had been tightened?

In each case, the same three steps apply: build the causal graph for that domain, extend it with U variables for unmeasured individual factors, then run abduction and ask the counterfactual. The answer is always individual — not a population rate, but a specific claim about this case, this decision, this outcome.

1. Build the Causal Model

Prompt

Build a causal DAG for cardiovascular hospitalisation risk. Include modifiable risk factors, comorbidities, lifestyle variables, treatment decisions, and outcome nodes. No evidence entered — show population priors as bell curves on each node.

Two paths to the same model

The causal arrows are always human-owned — causal claims your team is willing to defend. Arrived at by either expert-only structured elicitation, or human + LLM where the model proposes structure and experts challenge and correct.

The compounding advantage

Every causal model your team builds adds to a library of encoded domain knowledge. That library is auditable, transferable, and owned by your organization. It does not leave when the expert retires. It does not hallucinate when asked a question outside its training distribution.

Each node is a variable your team agreed matters. Each arrow is a causal claim. The bell curves reflect population-level uncertainty. Green borders mark evidence nodes (Age, TotalCholesterol, Statins). Red border marks the query node (Hospitalization).

Either way the result is the same: causal claims your organization has committed to, in a form that can answer questions no dashboard can touch. The prompts shown below each diagram are shorthand instructions for producing each step.

The Causal Model

Full causal model — no evidence entered, population priors shown as bell curves

Dialog: But how do you know the structure is right? Aren't you just baking in assumptions? — answered: yes, and that is the point. Every causal claim depends on assumptions. The choice is whether to make them explicit or hide them inside a model's training procedure.

Build & Extend the Model

2. Extend to a Structured Causal Model

Prompt

Add an exogenous U variable to each node for unmeasured background factors that will be inferred by abduction. Show all distributions at prior — no evidence entered.

A U variable has been added beside each variable — representing everything that influences an outcome but is not in the data: genetics, lifestyle, unmeasured comorbidities. This is also the minimum causal subset of the full model. Right now the U variables are at their defaults. In the next step, the patient's observed data will allow the model to infer what these hidden factors must have been.

The Minimal Structured Causal Model

SCM with U variables at priors — no evidence entered

3. Answer the Counterfactual — what would have happened had statins been prescribed?

Prompt

Has been entered: Age=70, TotalCholesterol=200, Statins=0, Hospitalization=1. Run abduction — compute posteriors for all U variables. Then apply do(Statins=1), hold U variables fixed, propagate forward and show the new Hospitalization distribution.

Abduction has been run. The model has inferred this patient's background factors. The broad curves have collapsed to narrow spikes. Statins (orange border) is fixed at zero: not prescribed. This is the factual world as it occurred.

After abduction, the model no longer represents the population. It represents this individual.

Abduction

Now the counterfactual question is asked: what would have happened if statins had been prescribed? The model applies do(Statins=1) — Pearl's notation for a direct intervention that sets a variable to a value, bypassing its normal causes — a supposition, not what actually happened. All hidden factors remain fixed at this patient's inferred values. Hospitalization shifts to −37.8 ± 3.59. That shift is the individual treatment effect. No heatmap, regression, or statistical model can produce this.

do(Statins=1)

Reading the Results

The model does not require perfect data. It requires honest domain knowledge about what causes what — which exists in every organization that has been making decisions long enough to have learned from some of them.

The observed fact is that this patient was hospitalised — Hospitalization = 1. After abduction, the model's continuous score reads −36.9 ± 3.59. The counterfactual, with statins prescribed, reads −37.8 ± 3.59. The two are almost identical.

That near-identity is not a flaw. It is an answer. The model is telling you that for this patient, the statin decision was not the primary driver of hospitalisation. The causal pathway from Statins runs through LDLCholesterol and then HeartDisease before reaching Hospitalization — three nodes, each attenuating the effect.

The reason the score sits at −36.9 rather than near 1 — despite the observed hospitalisation — lies in the U variable. After abduction, U_Hospitalization is inferred at 5.88. That large background factor reconciles the patient's observed outcome with everything else in the model. It represents unmeasured individual circumstances. That factor is held fixed in the counterfactual.

A population statistic says statin users are hospitalised 23% less often. That is true across the population. This model says: for this patient, the treatment effect was small. Both statements are correct. Only one is useful for a decision about this individual.

In risk management, the same distinction matters every time a loss occurs. A population rate tells you how often controls work across a portfolio. An individual treatment effect tells you whether this control would have prevented this loss — given this specific environment, this attack vector, this set of circumstances. That is the question a board asks after an incident. It is the question a regulator asks when reviewing a decision. It is what "root cause analysis" is attempting and failing to answer when it produces a timeline rather than a causal estimate. A structured causal model produces the estimate.

In the cases

Healthcare

Statins & Hospitalisation

The walkthrough on this page builds the statin causal model step by step — the same model used in the case study.

Next Step

The question your board asks after every significant loss is a counterfactual. Now you can answer it.

info@rung3.ai

See also

From a blank sheet to a working model — the same procedure applied to a full domain walkthrough: a property insurance rate-change decision, traced from first elicitation session to queryable model.

¹ For illustration only. · ² DAG — Directed Acyclic Graph: nodes are variables, arrows are causal claims, absence of cycles ensures a consistent probability distribution. · ³ Abduction is inference of a hypothesis from data.

What would have happened had we acted differently?

On this page

The Counterfactual Question

The Causal Model

Build & Extend the Model

The Minimal Structured Causal Model

Abduction

do(Statins=1)

Reading the Results