A randomized trial tells you the average treatment effect in a carefully selected population. A causal model tells you the individual treatment effect for the patient in front of you — conditioning on their full covariate history, including confounders that were never randomized. These are not the same question, and they do not have the same answer.

Combined dialog. Q1: Don't you need a randomized trial to establish causation? A1: For the strongest evidence on a population average, yes. For most decisions, no — and often impossible. Causal models extract what observational data can tell you, name the assumptions required, and pair the answer with a sensitivity analysis showing how robust it is. Q2: Personalized medicine already has CATE estimators and risk calculators. What does an SCM contribute? A2: Rung 3. CATE answers what works on average. Risk calculators answer who tends to respond. Neither answers what the clinician actually asks at the bedside — would this patient have done better under the alternative? Only an SCM can. Same data; one question further up the ladder.

Standard statistical models ask: What is the average treatment effect across a population? A trial recruits patients, randomizes treatment, and reports the mean outcome difference. That number is real and useful. But it is not the number a clinician needs when a specific patient — with a specific genomic profile, disease history, and comorbidity burden — sits across the desk. Personalized medicine asks something fundamentally harder: What will happen to this patient if I give them treatment A versus treatment B?

A Structural Causal Model (SCM) is the mathematical framework that makes that question answerable. An SCM encodes the data-generating process as a set of structural equations — each variable is a deterministic function of its direct causes plus a background noise term that captures all unmeasured heterogeneity. The graph tells you what causes what. The noise terms encode the irreducible differences between patients who look identical on every measured covariate.

The key consequence: randomness in an SCM does not come from the world being non-deterministic. It comes from hidden heterogeneity — the things we haven't measured. That reframing is what makes individual-level reasoning possible in principle, even if it remains constrained by data in practice.

The Ladder of Causation

Judea Pearl's Causal Hierarchy organizes all statistical and causal questions into three rungs. Moving up the ladder requires stronger assumptions and, in return, answers harder clinical questions. Personalized medicine requires all three.

1

Association

Observing and filtering. The model uses conditional probability to update beliefs given new evidence. No causal direction is implied — evidence flows symmetrically through the graph. This is the rung of biomarker screening, risk scoring, and EHR-based prediction.

→ P(Y | X) · "Do patients on drug X tend to survive longer?"

2

Intervention

Acting on the world. The do-operator severs the influence of a variable's natural parents and sets it to a specified value. This isolates the causal effect of a decision from the confounders that tend to accompany it in observational data — the fundamental problem with naively learning from EHRs.

→ P(Y | do(X=x)) · "If I prescribe X, what is the expected outcome?"

3

Counterfactual

Reasoning about the path not taken. Given that this specific patient received treatment A and had outcome Y, what would have happened under treatment B? This requires abduction — using the observed outcome to infer the patient's background noise terms — before applying the hypothetical intervention. Only SCMs can answer Rung 3 questions.

→ P(Yx | X=x′, Y=y′) · "Would this patient have responded to a different regimen?"

The Fundamental Problem of Causal Inference

You can never observe both potential outcomes for the same patient. The individual treatment effect τi = Yi(1) − Yi(0) is fundamentally unobservable — you can give or withhold treatment, but not both. Every method for personalized medicine is a principled strategy for making that missing counterfactual estimable under explicit assumptions.

Individual Treatment Effect (ITE)

The ITE is the gold standard of personalized medicine: the difference in outcomes for patient i under treatment versus control.

τ_i = Y_i(1) − Y_i(0)

The ITE is never directly observable (the fundamental problem above), but in an SCM it is well-defined: once the background noise Ui is pinned down via abduction, both potential outcomes can be computed by intervening on the treatment node. The ITE is what courts, regulators, and precision oncologists want — but its estimation requires strong assumptions about the functional form of the structural equations and the distribution of background noise.

Conditional Average Treatment Effect (CATE)

The CATE is the practical target for most personalized medicine applications: the expected treatment effect conditional on a patient's measured covariate profile X = x.

τ(x) = E[Y(1) − Y(0) | X = x]

Unlike the ITE, the CATE is identifiable from observational data under standard causal assumptions (unconfoundedness, overlap, consistency). It answers: for patients who look like this, what is the average benefit of treatment? The CATE collapses individual heterogeneity within the covariate strata, but captures the heterogeneity across strata — which is often where the clinically actionable signal lives.

Counterfactual Outcome (SCM-specific)

The counterfactual outcome Yx(u) asks what the outcome would have been for a specific individual with background noise U = u, had we applied intervention do(X = x). This is computed via the abduction–action–prediction procedure:

1

Abduction

Use the observed evidence (the patient's actual history and outcome) to update the posterior distribution over background noise U. This pins the model to this specific patient's unobserved characteristics.

2

Action

Modify the structural model by applying the counterfactual intervention — for example, do(Treatment = Immunotherapy) even though the patient actually received chemotherapy. This severs the treatment node's connection to its parents.

3

Prediction

Propagate the modified model forward with the abducted U values held fixed to compute the counterfactual outcome. The difference from the observed outcome is the individual causal effect for this patient.

"The difference between the ITE and the CATE is the difference between a court verdict and a population policy — both are causal claims, but they require different evidence and carry different consequences."
— Pearl & Mackenzie, The Book of Why, 2018
Method Core Idea Best When Limitation
T-Learner Fit separate outcome models for treated and control groups; take the difference. Large, balanced samples with clear treatment/control separation. High variance when treatment groups differ substantially in size or covariate distribution.
S-Learner Single outcome model with treatment as a feature; CATE from treatment coefficient. When treatment effect is smooth and small relative to outcome variance. Can regularize the treatment effect toward zero if it dominates noise.
X-Learner Impute ITEs from each group using the other group's model; regress imputed ITEs on covariates. Highly imbalanced treatment groups (common in EHR data). Propagates outcome model errors into the ITE imputation stage.
DR-Learner Doubly-robust: combines propensity score and outcome model; consistent if either is correctly specified. Observational data with moderate confounding and known propensity model. Requires correct specification of at least one nuisance model.
Causal Forest Random forests adapted to minimize CATE estimation error via honest splitting. High-dimensional covariates, nonlinear heterogeneity, no parametric assumptions. Computationally expensive; inference requires bootstrap or asymptotic theory.
TARNet / CFRNet Deep nets with shared representation layer; CFRNet adds IPM regularization to balance treated/control. Large datasets, complex non-linear treatment heterogeneity, image or text covariates. Requires large training sets; representation balancing can destroy predictive signal.
Proximal Causal Learning Uses proxy variables to identify causal effects despite unmeasured confounders. When hidden confounders are known to exist but cannot be measured directly. Requires two sets of valid proxy variables, which are rarely available in practice.

Handling Hidden Confounding

In EHR data, physicians prescribe based on factors that are not fully recorded. This creates unmeasured confounding that can bias any CATE estimator. Three strategies are commonly used: instrumental variables (exploit variation in treatment assignment that is independent of the outcome, such as physician prescribing style); sensitivity analysis (bound the causal effect under violations of unconfoundedness using Rosenbaum bounds or E-values); and proximal causal learning (use proxy measurements of the hidden confounder to achieve identification). Each strategy trades off different assumptions about the structure of the confounding mechanism.

Time-Varying Confounding

When treatment, outcome, and covariates all evolve over time — as in sepsis, cancer, or chronic disease management — standard adjustment for baseline covariates is insufficient. Intermediate variables can be both confounders (affecting later treatment decisions) and mediators (part of the causal pathway). Marginal structural models, G-estimation, and dynamic treatment regime estimation are designed specifically for this setting.

Each item below illustrates a distinct failure mode of standard predictive modeling in medicine, and how an SCM resolves it. Click any card for the full case study and Bayes Server model.

Pharmacovigilance
Adverse Event Attribution — Did the Drug Cause the Injury?
When a patient develops acute kidney injury three days after starting an NSAID, the question is not "do NSAIDs increase AKI risk on average?" — that is settled. The question is "did the NSAID cause the injury in this patient, given their pre-existing CKD and perioperative dehydration?" That is a Rung 3 counterfactual claim, and it is the question at the center of every drug liability case and every post-market safety review.
Oncology
Immunotherapy vs. Chemotherapy — Who Benefits from Which?
Tumor mutation burden and PD-L1 expression are correlated with immunotherapy response — but correlation is not sufficient to decide treatment. A causal model separates the effect of the drug from the selection process by which oncologists already prescribe it to patients most likely to respond. Without a do-operator, the observational advantage of immunotherapy is confounded by exactly the biomarkers we are trying to use.
Drug Repurposing
Transporting Trial Results to Novel Target Populations
A drug found effective in a rheumatoid arthritis trial may benefit cardiovascular patients — but the trial population differed in age, comorbidity burden, and baseline risk. Do-calculus provides the formal machinery for transporting a causal effect estimated in one population to a target population with a different covariate distribution, under explicit assumptions about which variables the transportability selection depends on.
Psychiatry
Treatment-Resistant Depression — Counterfactual Sequencing
After two failed SSRI trials, the next treatment decision is not "which antidepressant works best on average" — it is "given everything true about this patient, including the specific drugs that already failed, what would have happened under an alternative path?" That is a Rung 3 question. It requires abduction over the patient's observed treatment history and outcomes before any intervention can be modeled.
Critical Care
Sepsis — Dynamic Treatment Regimes via Causal Reinforcement Learning
Sepsis management involves sequences of interdependent decisions — fluid boluses, vasopressors, antibiotics — made at short time intervals under evolving physiological states. Standard RL learns from historical policies; causal RL learns from the interventional distribution. The difference matters because ICU physicians are not random: they administer vasopressors when they already suspect hemodynamic compromise, creating severe time-varying confounding in the observational data.

The same six-step procedure governs every case in this series. Steps 1–3 are about identification — turning a clinical question into a well-posed estimand that is in principle answerable from data. Steps 4–6 are about estimation and validation — choosing the right method, stress-testing its assumptions, and establishing that the model output is trustworthy enough to inform a treatment decision.

1

Draw the DAG

Work with clinical domain experts to encode the causal structure as a directed acyclic graph. Every node is a variable; every edge is a direct causal mechanism. Hidden variables (genetic susceptibility, unmeasured severity) are represented as exogenous noise nodes U. The DAG is not inferred from data — it is a statement of scientific knowledge that data cannot contradict, only refine.

2

Identify the Estimand

Specify precisely what is being estimated: ATE (average across all patients), CATE (average within a covariate stratum), individual counterfactual outcome, or optimal treatment policy. The estimand determines which identification assumptions are needed and which data structure is required. A vague clinical question ("is this drug better?") must become a precise causal estimand before any analysis can begin.

3

Check Identifiability

Apply the backdoor criterion, frontdoor criterion, or do-calculus to determine whether the causal estimand is identifiable from the available data structure. If confounders block identification, consider instrumental variable strategies, proxy variables, or design-based solutions such as natural experiments. If the estimand is not identifiable, no amount of data or computation will produce a valid causal estimate — document this explicitly.

4

Choose an Estimator

Match the estimator to the data structure: T-learner or X-learner for imbalanced observational samples; DR-learner when a propensity model is available; Causal Forest for high-dimensional, non-linear heterogeneity; TARNet or CFRNet for large-scale deep learning settings. For time-varying treatments, use marginal structural models or G-estimation. For counterfactuals, the full SCM (abduction–action–prediction) is required.

5

Assess Sensitivity

All causal estimates rest on untestable assumptions. Quantify the robustness of the estimate to violations of those assumptions: Rosenbaum bounds for hidden confounders, E-values for unmeasured confounding strength, refutation tests (random common cause, placebo treatment, data subset). A causal estimate without a sensitivity analysis is not ready for clinical decision support.

6

Validate Causally

Predictive validation (held-out R²) is insufficient for causal models. Validate against external RCT data where available; use policy-level metrics (improvement in outcomes under the estimated optimal policy); and exploit natural experiments or quasi-experimental designs to benchmark the model's causal claims against variation that is as good as randomized. If no external validation is feasible, document the assumptions and their clinical implications explicitly.

Structural Causal Model (SCM)
A tuple M = ⟨U, V, F, P(U)⟩ where U are exogenous background variables, V are observed endogenous variables, F are structural equations specifying how each V is caused by its parents and its background noise, and P(U) is the distribution over background variables.
Individual Treatment Effect (ITE)
τ_i = Y_i(1) − Y_i(0): the difference between a specific patient's outcomes under treatment and control. Fundamentally unobservable — both potential outcomes are never simultaneously realized. Estimable under SCM assumptions via the abduction–action–prediction procedure.
Conditional Average Treatment Effect (CATE)
τ(x) = E[Y(1) − Y(0) | X = x]: the expected treatment effect for patients with covariate profile x. Identifiable from observational data under unconfoundedness and overlap. The principal target of most personalized medicine estimation methods.
do-calculus
A formal algebra developed by Pearl for deriving interventional distributions P(Y | do(X=x)) from observational data and a causal graph. The do-operator severs a variable's connection to its parents in the graph, simulating an external intervention rather than passive observation.
Abduction
Step 1 of the counterfactual procedure: using observed evidence (including the actual outcome) to update the posterior distribution over background noise variables U. Abduction pins the model to this specific patient's unmeasured characteristics before a hypothetical intervention is applied.
Confounding by Indication
A specific type of confounding endemic to observational medical data: physicians prescribe treatments based on patient characteristics that also predict the outcome. The same factors that make a patient likely to receive immunotherapy also make them more likely to respond — creating spurious correlation between treatment and outcome that naïve analysis mistakes for a causal effect.
Identifiability
A causal estimand is identifiable if it can be expressed as a function of the observational distribution P(V) using the causal graph alone, without requiring access to unobserved variables or experimental interventions. The backdoor and frontdoor criteria are the standard tools for checking identifiability.
Positivity (Overlap)
The assumption that every patient covariate profile has a non-zero probability of receiving each treatment: 0 < P(T=1 | X=x) < 1 for all x in the support. Violations — common in EHR data where some subpopulations never receive certain treatments — make CATE estimation impossible without extrapolation.
Next Step

Your predictive model is not a causal model — and the gap between them is exactly where patient harm lives.

info@rung3.ai

Pearl, J. & Mackenzie, D. (2018). The Book of Why. Basic Books. · Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press. · Künzel, S.R. et al. (2019). "Metalearners for estimating heterogeneous treatment effects." PNAS. · Wager, S. & Athey, S. (2018). "Estimation and inference of heterogeneous treatment effects using random forests." JASA. · Shalit, U. et al. (2017). "Estimating individual treatment effect: generalization bounds and algorithms." ICML.