A clinical protocol is a causal claim. "Patients meeting criteria X should receive treatment Y" asserts that the intervention improves the outcome on average, under the conditions the guideline assumes. The claim is testable. Where it fails on a specific patient, the failure is identifiable — but only if the model is structural.

Dialog: Subgroup analysis and drug-interaction screening have been clinical practice for decades. What does an SCM add? — answered: Not the calculation — the framework. Clinicians have been deriving these effects one study at a time, hand-coding interaction terms, pre-registering subgroup analyses. The SCM doesn't replace any of that. It makes the assumptions behind each derivation explicit, auditable, and composable across studies — so the next question doesn't require re-deriving the answer from scratch. Same arithmetic. Shared graph behind it.

Standard evidence-based medicine asks: What is the average treatment effect across a study population? A randomised trial recruits, randomises, and reports a mean outcome difference. The number is real. A guideline committee then converts that number into a protocol: patients with characteristics X should receive treatment Y. The conversion is a causal claim about every patient meeting the criteria — not just those in the trial — and it rests on assumptions the guideline rarely names.

A Structural Causal Model (SCM) writes those assumptions down. The graph encodes which variables cause which others. The structural equations encode how. The model makes the difference between "this protocol is right on average" and "this protocol applies to this patient" a question that can be checked against data, rather than a tacit clinical inference.

The key consequence: a guideline's failure on an individual patient is rarely a failure of the trial. It is a failure of transport — the trial's average does not apply to the case because the underlying causal structure differs. SCMs identify when transport fails and why.

The Ladder of Causation

Judea Pearl's Causal Hierarchy organizes clinical questions into three rungs. Most protocols are written at Rung 1 (the trial's observed association) but applied as if they were Rung 2 claims (an intervention in this patient). The mismatch is where harm enters.

1

Association

What was observed in the trial. Patients on statins had 25% fewer cardiovascular events than matched controls. The number is the population average. It is not, in itself, a claim about what any specific patient would experience under the same prescription.

→ P(Y | X) · "What was the event rate among patients prescribed X?"

2

Intervention

What the protocol assumes. If we prescribe statin to this patient, the expected outcome is the trial average minus 25%. This is a causal claim, not an observational one — and it is valid only when the patient's covariate distribution and outcome mechanism match the trial's. The do-operator makes that assumption explicit.

→ P(Y | do(X=x)) · "If I prescribe X to a patient with profile Z, what is the expected outcome?"

3

Counterfactual

What the audit would need to answer. This patient followed the protocol and had a poor outcome. Would they have done better under the alternative path the guideline rejects? Only Rung 3 lets a clinical-governance review decide whether a bad outcome was the protocol's failure on this patient, or the inherent residual risk the trial already quantified.

→ P(Yx | X=x′, Y=y′) · "Would this patient have responded under the alternative?"

The Fundamental Problem of Protocol-Level Causal Inference

The same trial average can be unbiased for the population and badly biased for some subpopulations. A causal model identifies the subgroups where the average breaks. Without one, the failure is invisible until the adverse-event review — by which time the patient has already absorbed the consequence.

Average Treatment Effect (ATE)

The ATE is what trials report and guidelines apply: the expected outcome difference if every eligible patient received the treatment, versus if none did.

ATE = E[Y(1) − Y(0)]

The ATE is the protocol-level estimand. It answers "is the rule worth having?" — not "does the rule apply to this patient?". Guidelines that present only an ATE leave the second question unanswered, transferring the structural inference to the clinician at the point of care.

Conditional Average Treatment Effect (CATE)

The CATE refines the ATE by stratifying on observed covariates. It is the operational target of protocol-level personalisation: not "what works on average" but "for whom does the average hold".

τ(x) = E[Y(1) − Y(0) | X = x]

When the CATE varies substantially across strata, the protocol's uniform rule masks meaningful heterogeneity. Identifying the strata where the protocol fails — and the structural reason for the failure — is the contribution of a causal model to guideline writing. Most flat protocols in current use have not been tested against their own CATE.

Iatrogenic Effect Decomposition

When polypharmacy creates an adverse outcome, the question is not "did the disease progress?" but "how much of the symptom is the disease, and how much is the treatment of another condition?" An SCM decomposes the observed outcome into the contribution from disease progression and the contribution from drug-on-drug interaction:

1

Identify the two cause paths

Draw the DAG showing both pathways into the observed symptom: disease → symptom, and co-medication → drug interaction → symptom. Each path is a separable contribution that the structural model holds apart.

2

Estimate the drug-interaction path

Use the patient's polypharmacy record and known pharmacokinetic interactions to attribute the portion of the symptom load to the drug-on-drug pathway. This is the counterfactual: what the symptom would be in the absence of the co-medication, holding disease severity fixed.

3

Choose the intervention

If the drug-interaction contribution is large, de-prescribing one of the interacting agents is structurally indicated — increasing the dose of the symptomatic treatment is contraindicated. The model converts a clinical impasse into an arithmetic comparison.

"A guideline is a causal claim about a population. The same guideline applied to a specific patient is a different causal claim — about that patient. The two claims need not be true together."
— Hernán, M. & Robins, J., Causal Inference: What If, 2020
Strategy Core Idea Best When Limitation
Backdoor Adjustment Control for a set of measured covariates that blocks all confounding paths from treatment to outcome. The clinical context is well-understood and the relevant covariates are recorded in the EHR. Fails silently when an unmeasured confounder remains; cannot distinguish a structural failure from a coding gap.
Instrumental Variables Exploit variation in treatment that is independent of the outcome (e.g., physician prescribing preference, formulary changes). Variation across providers or institutions can be treated as plausibly random. Valid instruments are rare; weak instruments give wide and biased estimates.
Transport Formula Re-weight or re-estimate a trial's effect to a target population whose covariate distribution differs from the trial's. The trial population and the clinical population differ on identifiable covariates the trial recorded. Requires correct specification of the selection mechanism; collapses when the target population includes covariate values absent from the trial.
Mediation Analysis Decompose the treatment's effect into a direct effect on the outcome and an indirect effect through a measured mediator. The protocol's mechanism passes through an identifiable intermediate variable (a biomarker, a downstream complication). Requires no unmeasured mediator-outcome confounding; sensitive to mediator measurement error.
Negative Control Outcomes Choose an outcome that should be unaffected by the treatment but shares the confounding structure; bias on the negative control bounds bias on the target. A plausible negative control outcome exists and is reliably recorded in the same dataset. Requires structural similarity between the negative and target outcomes; misidentified controls give false reassurance.
Sensitivity Analysis Bound the causal estimate under specified violations of unconfoundedness (Rosenbaum bounds, E-values). The estimate is to be defended in a regulatory or governance setting where the unconfoundedness assumption will be challenged. Sets bounds, not point estimates; does not identify which confounder is missing, only the strength such a confounder would need.
Mechanistic Constraints Encode known pharmacokinetic or physiological mechanisms as structural equations the model cannot violate. The clinical mechanism is well-characterized independently of the data (drug interactions, dose-response curves). Constraints are only as good as the published mechanism; out-of-distribution patients can fall outside the validated range.

Handling Time-Varying Treatments

Chronic-disease protocols are not single decisions but sequences. Statin therapy is reviewed and titrated over years; polypharmacy regimens evolve as comorbidities accumulate. Standard adjustment for baseline covariates is insufficient when intermediate variables — current LDL, current renal function, current symptom burden — are both confounders of later decisions and mediators of earlier ones. Marginal structural models and G-estimation are the standard tools for this setting; they decouple the time-varying confounding from the time-varying mediation that standard regression cannot tell apart.

Transport Failure as the Default Mode

A trial population almost never matches the clinical population the protocol is applied to. The age distribution differs; the comorbidity burden differs; the polypharmacy environment differs. A guideline that does not name these differences and the structural assumptions under which the average still holds is silently extrapolating. SCMs make the extrapolation visible and bound-able.

Each case below illustrates a distinct failure mode of population-level evidence applied as protocol, and how an SCM identifies the failure from EHR data the system already records. Click any card for the full case study and Bayes Server model.

The companion hub Personalized Medicine runs the same critique at the individual-counterfactual level — Rung 3 questions about a specific patient, not Rung 1 critiques of a population rule. The two hubs share a thesis and divide the work: clinical decision-making is about the rule; personalized medicine is about the rule's application to the individual.

The same six-step procedure governs every case in this hub. Steps 1–3 turn a guideline-level claim into a causal estimand identifiable from the available data. Steps 4–6 choose an estimator, stress-test the structural assumptions, and validate the result against external evidence — producing an artefact a clinical-governance committee can defend.

1

Draw the DAG

Work with clinical leads to encode the protocol's implicit causal structure as a directed acyclic graph. Every node is a variable; every edge is a direct causal mechanism the protocol relies on. Hidden variables — unmeasured severity, unrecorded co-medication, undiagnosed comorbidity — appear as exogenous noise nodes U whose distribution must be estimated or bounded. The DAG is a statement of clinical knowledge that data refines but cannot overturn.

2

Identify the Estimand

Translate the protocol's claim into a precise causal target: ATE (does the rule work on average in this population?), CATE (does it work for this subgroup?), counterfactual (would this patient have benefited under the alternative?), or transport (does the trial's effect carry to this clinical setting?). Each estimand demands different identification assumptions and different data structure.

3

Check Identifiability

Apply the backdoor criterion, frontdoor criterion, or do-calculus to determine whether the estimand is identifiable from the EHR data available. If confounders block identification, escalate to instrumental variables, proxy variables, or mechanistic constraints. If the estimand is not identifiable, document this explicitly — no audit committee should be asked to defend a number that no method can produce.

4

Choose an Estimator

Match the estimator to the data and the estimand. Outcome regression with backdoor adjustment for clean confounding; instrumental variables for unmeasured confounding plus a valid instrument; marginal structural models or G-estimation for time-varying treatments and confounders; mediation analysis for decomposing effect pathways. The choice is structural, not aesthetic — wrong estimator gives wrong number even with right DAG.

5

Assess Sensitivity

Every protocol-level estimate rests on untestable assumptions. Quantify robustness: Rosenbaum bounds and E-values for hidden confounding; negative control outcomes for structural bias detection; refutation tests for placebo treatment and random common cause. A guideline change that cannot survive its own sensitivity analysis should not be made.

6

Validate Causally

Predictive validation (held-out AUC) is insufficient — a model can predict accurately and still be causally wrong. Validate against external trial evidence, against natural experiments (a formulary change, a guideline update), or against negative control outcomes that share confounding structure. Where no external benchmark is available, document the structural assumptions and their clinical implications so the governance committee can see what is being assumed on their behalf.

Average Treatment Effect (ATE)
E[Y(1) − Y(0)]: the expected outcome difference if every member of the population received the treatment versus if none did. The protocol-level estimand: "is this rule worth having?". Reported by trials and assumed by guidelines, but the same ATE can be unbiased for the population and badly biased for identifiable subpopulations.
Conditional Average Treatment Effect (CATE)
τ(x) = E[Y(1) − Y(0) | X = x]: the expected treatment effect for patients with covariate profile x. The operational target of protocol personalisation. Identifiable from observational data under unconfoundedness and overlap; reveals the heterogeneity a flat protocol masks.
do-calculus
A formal algebra developed by Pearl for deriving interventional distributions P(Y | do(X=x)) from observational data and a causal graph. The do-operator severs a variable's connection to its parents, simulating an external intervention rather than passive observation — converting the protocol's implicit causal claim into a checkable formula.
Transport Formula
A do-calculus derivation that re-weights a trial's causal estimate to a target population whose covariate distribution differs from the trial's. Names the structural assumptions under which extrapolation is valid, and identifies the assumptions whose failure invalidates the extrapolation.
Time-Varying Confounding
Confounding that evolves with the treatment course. Intermediate variables (current LDL, current renal function) are confounders of later treatment decisions and mediators of earlier ones; standard regression conflates the two. Marginal structural models and G-estimation are designed for this setting.
Iatrogenic Decomposition
Separation of an observed symptom into the contribution from disease progression and the contribution from drug-on-drug interaction. Each is a distinct path in the DAG; treating only the disease term when the iatrogenic term is dominant escalates the wrong mechanism. The decomposition is the structural reason for de-prescribing.
Negative Control Outcome
An outcome the treatment cannot causally affect but which shares the confounding structure of the target outcome. Bias on the negative control outcome bounds bias on the target. A primary technique for detecting unmeasured confounding when an instrumental variable is unavailable.
Identifiability
A causal estimand is identifiable if it can be expressed as a function of the observational distribution P(V) using the causal graph alone, without requiring unobserved variables or experimental data. Backdoor and frontdoor criteria are the standard tools. A guideline whose target estimand is not identifiable in the available data is making a claim no analysis can defend.
Next Step

Your predictive model is not a causal model — and the gap between them is exactly where patient harm lives.

info@rung3.ai

Pearl, J. & Mackenzie, D. (2018). The Book of Why. Basic Books. · Pearl, J. (2009). Causality: Models, Reasoning, and Inference. Cambridge University Press. · Künzel, S.R. et al. (2019). "Metalearners for estimating heterogeneous treatment effects." PNAS. · Wager, S. & Athey, S. (2018). "Estimation and inference of heterogeneous treatment effects using random forests." JASA. · Shalit, U. et al. (2017). "Estimating individual treatment effect: generalization bounds and algorithms." ICML.