A clinical protocol is a causal claim. "Patients meeting criteria X should receive treatment Y" asserts that the intervention improves the outcome on average, under the conditions the guideline assumes. The claim is testable. Where it fails on a specific patient, the failure is identifiable — but only if the model is structural.
The Core Problem
Standard evidence-based medicine asks: What is the average treatment effect across a study population? A randomised trial recruits, randomises, and reports a mean outcome difference. The number is real. A guideline committee then converts that number into a protocol: patients with characteristics X should receive treatment Y. The conversion is a causal claim about every patient meeting the criteria — not just those in the trial — and it rests on assumptions the guideline rarely names.
A Structural Causal Model (SCM) writes those assumptions down. The graph encodes which variables cause which others. The structural equations encode how. The model makes the difference between "this protocol is right on average" and "this protocol applies to this patient" a question that can be checked against data, rather than a tacit clinical inference.
The key consequence: a guideline's failure on an individual patient is rarely a failure of the trial. It is a failure of transport — the trial's average does not apply to the case because the underlying causal structure differs. SCMs identify when transport fails and why.
The Ladder of Causation
Judea Pearl's Causal Hierarchy organizes clinical questions into three rungs. Most protocols are written at Rung 1 (the trial's observed association) but applied as if they were Rung 2 claims (an intervention in this patient). The mismatch is where harm enters.
Association
What was observed in the trial. Patients on statins had 25% fewer cardiovascular events than matched controls. The number is the population average. It is not, in itself, a claim about what any specific patient would experience under the same prescription.
→ P(Y | X) · "What was the event rate among patients prescribed X?"
Intervention
What the protocol assumes. If we prescribe statin to this patient, the expected outcome is the trial average minus 25%. This is a causal claim, not an observational one — and it is valid only when the patient's covariate distribution and outcome mechanism match the trial's. The do-operator makes that assumption explicit.
→ P(Y | do(X=x)) · "If I prescribe X to a patient with profile Z, what is the expected outcome?"
Counterfactual
What the audit would need to answer. This patient followed the protocol and had a poor outcome. Would they have done better under the alternative path the guideline rejects? Only Rung 3 lets a clinical-governance review decide whether a bad outcome was the protocol's failure on this patient, or the inherent residual risk the trial already quantified.
→ P(Yx | X=x′, Y=y′) · "Would this patient have responded under the alternative?"
The same trial average can be unbiased for the population and badly biased for some subpopulations. A causal model identifies the subgroups where the average breaks. Without one, the failure is invisible until the adverse-event review — by which time the patient has already absorbed the consequence.
Key Estimands
Average Treatment Effect (ATE)
The ATE is what trials report and guidelines apply: the expected outcome difference if every eligible patient received the treatment, versus if none did.
The ATE is the protocol-level estimand. It answers "is the rule worth having?" — not "does the rule apply to this patient?". Guidelines that present only an ATE leave the second question unanswered, transferring the structural inference to the clinician at the point of care.
Conditional Average Treatment Effect (CATE)
The CATE refines the ATE by stratifying on observed covariates. It is the operational target of protocol-level personalisation: not "what works on average" but "for whom does the average hold".
When the CATE varies substantially across strata, the protocol's uniform rule masks meaningful heterogeneity. Identifying the strata where the protocol fails — and the structural reason for the failure — is the contribution of a causal model to guideline writing. Most flat protocols in current use have not been tested against their own CATE.
Iatrogenic Effect Decomposition
When polypharmacy creates an adverse outcome, the question is not "did the disease progress?" but "how much of the symptom is the disease, and how much is the treatment of another condition?" An SCM decomposes the observed outcome into the contribution from disease progression and the contribution from drug-on-drug interaction:
Identify the two cause paths
Draw the DAG showing both pathways into the observed symptom: disease → symptom, and co-medication → drug interaction → symptom. Each path is a separable contribution that the structural model holds apart.
Estimate the drug-interaction path
Use the patient's polypharmacy record and known pharmacokinetic interactions to attribute the portion of the symptom load to the drug-on-drug pathway. This is the counterfactual: what the symptom would be in the absence of the co-medication, holding disease severity fixed.
Choose the intervention
If the drug-interaction contribution is large, de-prescribing one of the interacting agents is structurally indicated — increasing the dose of the symptomatic treatment is contraindicated. The model converts a clinical impasse into an arithmetic comparison.
Identification Strategies
| Strategy | Core Idea | Best When | Limitation |
|---|---|---|---|
| Backdoor Adjustment | Control for a set of measured covariates that blocks all confounding paths from treatment to outcome. | The clinical context is well-understood and the relevant covariates are recorded in the EHR. | Fails silently when an unmeasured confounder remains; cannot distinguish a structural failure from a coding gap. |
| Instrumental Variables | Exploit variation in treatment that is independent of the outcome (e.g., physician prescribing preference, formulary changes). | Variation across providers or institutions can be treated as plausibly random. | Valid instruments are rare; weak instruments give wide and biased estimates. |
| Transport Formula | Re-weight or re-estimate a trial's effect to a target population whose covariate distribution differs from the trial's. | The trial population and the clinical population differ on identifiable covariates the trial recorded. | Requires correct specification of the selection mechanism; collapses when the target population includes covariate values absent from the trial. |
| Mediation Analysis | Decompose the treatment's effect into a direct effect on the outcome and an indirect effect through a measured mediator. | The protocol's mechanism passes through an identifiable intermediate variable (a biomarker, a downstream complication). | Requires no unmeasured mediator-outcome confounding; sensitive to mediator measurement error. |
| Negative Control Outcomes | Choose an outcome that should be unaffected by the treatment but shares the confounding structure; bias on the negative control bounds bias on the target. | A plausible negative control outcome exists and is reliably recorded in the same dataset. | Requires structural similarity between the negative and target outcomes; misidentified controls give false reassurance. |
| Sensitivity Analysis | Bound the causal estimate under specified violations of unconfoundedness (Rosenbaum bounds, E-values). | The estimate is to be defended in a regulatory or governance setting where the unconfoundedness assumption will be challenged. | Sets bounds, not point estimates; does not identify which confounder is missing, only the strength such a confounder would need. |
| Mechanistic Constraints | Encode known pharmacokinetic or physiological mechanisms as structural equations the model cannot violate. | The clinical mechanism is well-characterized independently of the data (drug interactions, dose-response curves). | Constraints are only as good as the published mechanism; out-of-distribution patients can fall outside the validated range. |
Handling Time-Varying Treatments
Chronic-disease protocols are not single decisions but sequences. Statin therapy is reviewed and titrated over years; polypharmacy regimens evolve as comorbidities accumulate. Standard adjustment for baseline covariates is insufficient when intermediate variables — current LDL, current renal function, current symptom burden — are both confounders of later decisions and mediators of earlier ones. Marginal structural models and G-estimation are the standard tools for this setting; they decouple the time-varying confounding from the time-varying mediation that standard regression cannot tell apart.
A trial population almost never matches the clinical population the protocol is applied to. The age distribution differs; the comorbidity burden differs; the polypharmacy environment differs. A guideline that does not name these differences and the structural assumptions under which the average still holds is silently extrapolating. SCMs make the extrapolation visible and bound-able.
Applications in Medicine
Each case below illustrates a distinct failure mode of population-level evidence applied as protocol, and how an SCM identifies the failure from EHR data the system already records. Click any card for the full case study and Bayes Server model.
The companion hub Personalized Medicine runs the same critique at the individual-counterfactual level — Rung 3 questions about a specific patient, not Rung 1 critiques of a population rule. The two hubs share a thesis and divide the work: clinical decision-making is about the rule; personalized medicine is about the rule's application to the individual.
The Practical Framework
The same six-step procedure governs every case in this hub. Steps 1–3 turn a guideline-level claim into a causal estimand identifiable from the available data. Steps 4–6 choose an estimator, stress-test the structural assumptions, and validate the result against external evidence — producing an artefact a clinical-governance committee can defend.
Draw the DAG
Work with clinical leads to encode the protocol's implicit causal structure as a directed acyclic graph. Every node is a variable; every edge is a direct causal mechanism the protocol relies on. Hidden variables — unmeasured severity, unrecorded co-medication, undiagnosed comorbidity — appear as exogenous noise nodes U whose distribution must be estimated or bounded. The DAG is a statement of clinical knowledge that data refines but cannot overturn.
Identify the Estimand
Translate the protocol's claim into a precise causal target: ATE (does the rule work on average in this population?), CATE (does it work for this subgroup?), counterfactual (would this patient have benefited under the alternative?), or transport (does the trial's effect carry to this clinical setting?). Each estimand demands different identification assumptions and different data structure.
Check Identifiability
Apply the backdoor criterion, frontdoor criterion, or do-calculus to determine whether the estimand is identifiable from the EHR data available. If confounders block identification, escalate to instrumental variables, proxy variables, or mechanistic constraints. If the estimand is not identifiable, document this explicitly — no audit committee should be asked to defend a number that no method can produce.
Choose an Estimator
Match the estimator to the data and the estimand. Outcome regression with backdoor adjustment for clean confounding; instrumental variables for unmeasured confounding plus a valid instrument; marginal structural models or G-estimation for time-varying treatments and confounders; mediation analysis for decomposing effect pathways. The choice is structural, not aesthetic — wrong estimator gives wrong number even with right DAG.
Assess Sensitivity
Every protocol-level estimate rests on untestable assumptions. Quantify robustness: Rosenbaum bounds and E-values for hidden confounding; negative control outcomes for structural bias detection; refutation tests for placebo treatment and random common cause. A guideline change that cannot survive its own sensitivity analysis should not be made.
Validate Causally
Predictive validation (held-out AUC) is insufficient — a model can predict accurately and still be causally wrong. Validate against external trial evidence, against natural experiments (a formulary change, a guideline update), or against negative control outcomes that share confounding structure. Where no external benchmark is available, document the structural assumptions and their clinical implications so the governance committee can see what is being assumed on their behalf.
Key Terms
Your predictive model is not a causal model — and the gap between them is exactly where patient harm lives.
info@rung3.ai
Pearl, J. & Mackenzie, D. (2018). The Book of Why. Basic Books. · Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press. · Künzel, S.R. et al. (2019). "Metalearners for estimating heterogeneous treatment effects." PNAS. · Wager, S. & Athey, S. (2018). "Estimation and inference of heterogeneous treatment effects using random forests." JASA. · Shalit, U. et al. (2017). "Estimating individual treatment effect: generalization bounds and algorithms." ICML.