An engagement that produces a useful Rung 3 model is built in three phases. The phases are not modeling. Modeling comes after — and only after — the work below has narrowed the question portfolio to what the available evidence can actually answer.
The procedure exists because the default order — start with a question, find data that fits, build a model — has a high and predictable failure rate. The corrective is structural: audit what’s available before deciding what’s worth asking. We narrow what’s worth asking to what your data can actually answer.
How engagements fail
Four failure modes account for most projects that consume budget and produce no deployable model. Each has the same root: the discovery procedure was skipped or compressed.
Failure 1 — The model was built before the audit
The team commits to a model design in the kickoff meeting. Two months in, the data does not support the estimand. The structural variable the model needs was never recorded in the EHR / ERP / data warehouse. The team rebuilds toward a weaker question or, more commonly, the project is quietly retired. The cost was knowable on day one from an evidence inventory that was never produced.
Failure 2 — The question was never categorized
The stakeholder says "what causes customer churn." The team builds a churn classifier — a Rung 1 associational model. The stakeholder then asks "what should we do to reduce churn" — a Rung 2 interventional question — and is told to look at feature importance. Feature importance is not an intervention. The model is the wrong tool, but no one in the room noticed because the categorization step was never performed. The model deploys, the recommendations fail, the team is blamed for the wrong reasons.
Failure 3 — Expert priors were not collected
The DAG is drawn from data alone. The senior actuary, the chief clinician, the principal engineer with thirty years of process knowledge — none of them were asked, because their input wasn't seen as data. The model fights the experts on edge cases; the experts dismiss the model as naïve; the audit committee receives a model that the people who would have to defend it don't trust. The expertise was available and free. It was not collected because the team did not have a place to put it.
Failure 4 — The development cycle was run as waterfall
The first DAG is locked. Defending it becomes the project's deliverable, rather than refining it against stakeholder feedback. By month three the DAG is wrong in ways the team can identify but is no longer permitted to change. The cycle was designed to revise. It is being defended.
Each failure has the same root: the discovery procedure was skipped or compressed. What follows is that procedure.
Audit first, ask second
The default order — state the question, find supporting data, build a model — is the order a stakeholder naturally proposes. It is also the order that produces the failures above. The corrective is to invert the first two steps: enumerate the available evidence before committing to which questions are worth asking.
The argument for the inversion is structural, not stylistic. Three properties of evidence-first ordering matter:
- The question portfolio is bounded by the evidence frontier. Questions that cannot be answered with the data and expertise available become explicit requests for data acquisition, rather than implicit modeling failures discovered two months later.
- Expert priors enter as inputs, not corrections. When the audit precedes the question, the senior clinician's prior on a dose-response curve is collected as a structural assumption that the model will respect. When the audit follows the question, the same prior arrives as a complaint about the model's output.
- The procedure terminates honestly. An engagement that completes Phase 1 and discovers the data does not support any answerable Rung 3 question has produced a real artifact — the evidence inventory itself, which is often the basis for a data-engineering recommendation. An engagement that starts with the question and ends without a model has produced nothing.
This is a methodological commitment, not a procedural preference. A Rung 3 engagement built the other way is not a Rung 3 engagement — it is an analytics project with structural-causal-model branding. The order is the position.
Phase 1 — Audit data sources
The audit enumerates every input that could contribute to the model — quantitative records, qualitative expert knowledge, codified domain literature — and characterizes each one by its coverage, biases, and the kind of claim it can support.
What gets audited
Three categories, in order of how often they are missed:
Quantitative records. Transactional and operational data — EHR/CRM/ERP extracts, sensor streams, claims histories, sales pipelines, prescription registries. The audit asks not just what is recorded but what is recorded faithfully. A field that exists in the schema but is back-filled by a default value for 60% of records is a different audit finding than one that is missing entirely. Both findings constrain which estimands are identifiable.
Qualitative expert input. Subject-matter interviews structured to elicit causal claims, not opinions. A clinician's belief that "the dose-response curve is flat above 40mg" is a structural claim about a variable in the eventual DAG. A process engineer's belief that "failures cluster after the third maintenance cycle" is a claim about a mediator or a moderator. These are not anecdotes; they are constraints on the structural model. Collected systematically, they reduce the space of plausible DAGs faster than additional data would.
Codified domain knowledge. Regulatory documents, published clinical guidelines, actuarial tables, engineering specifications. These are typically treated as background and ignored. In a Rung 3 engagement they are inputs — they constrain the structural model the way a regulatory requirement constrains a building's structural model. A guideline that says "patients with renal function below X should receive dose Y" is a structural relationship in the patient-treatment DAG.
Company expertise is a data source
The distinctive claim of a Rung 3 engagement is that expert knowledge is evidence, not commentary. Standard machine learning treats experts as users of the model's outputs. The structural-causal-model framework treats them as contributors to the model's inputs — through their priors on parameter values, their judgments about graph structure, and their constraints on identifiability assumptions. This is not a softening of the data-driven ideal; it is a recognition that ignoring an actuary with thirty years of loss-curve experience is no more "data-driven" than ignoring a measured covariate. Both are evidence.
The audit therefore includes structured interviews with the people who will eventually be asked to defend the model. Their priors, judgments, and edge-case experience become inputs to Phase 3, not late-stage corrections.
What the audit produces
An evidence inventory: a written document listing each source, its coverage, its known biases, and the kind of claim it can support. The inventory is the basis on which Phase 2 evaluates which questions are answerable. It is committable on its own — many engagements stop after Phase 1 with a clear-eyed assessment of what data acquisition would be required to make Phase 3 possible. That assessment is the deliverable.
Phase 2 — Collect and categorize questions
With the evidence inventory in hand, Phase 2 collects the questions stakeholders want answered — and places each one on Pearl's ladder of causation. The categorization determines which methodology applies; the cross-check against the inventory determines whether the question can be answered at all.
Step A — Collect questions in their original phrasing
Resist the urge to translate stakeholder questions into causal notation. The phrasing itself carries information — about which rung the stakeholder believes they are asking, which often differs from the rung the question actually occupies. A stakeholder who says "which customers are about to churn?" is asking a Rung 1 question. A stakeholder who says "how do we keep them?" is asking a Rung 2 question. The two phrasings often come from the same person in the same meeting, and the team is expected to deliver "one model." Categorization makes the distinction visible.
Step B — Place each question on the ladder
Rung 1 — Associational. "What is?" Questions about patterns and correlations in observed data. "Which patients have the highest readmission rates?" "What features predict default?" Answerable from observational data alone. The tools are predictive models, classifiers, statistical summaries.
Rung 2 — Interventional. "What if we do?" Questions about the effect of an action we take. "What happens to readmission if we add a follow-up call?" "Does the marketing campaign increase conversion, or does the audience selection?" Requires a causal model. Cannot be answered by observational data alone — even with infinite data — because the question asks about a distribution we did not sample from.
Rung 3 — Counterfactual. "What would have been?" Questions about a world that did not happen, given what did. "Would this specific patient have responded under the alternative protocol?" "If this customer hadn't received the retention offer, would they have churned anyway?" Requires a structural model and the strongest assumptions. The questions a clinical-governance committee or a regulatory audit actually asks.
Step C — Cross-check against the inventory
Each categorized question is checked against the Phase 1 evidence inventory. Questions whose data the inventory does not contain become explicit data-acquisition requests, not implicit modeling failures. Questions whose expert priors are missing become explicit interview requests. Questions that pass the cross-check enter the engagement's answerable portfolio — the set of questions Phase 3 will actually develop models for.
The portfolio is typically smaller than the original question set. That reduction is a feature, not a failure. A focused portfolio of three answerable questions produces three defensible models. A broad portfolio of ten aspirational questions produces ten compromised ones.
Phase 3 — Begin the SCM development cycle
With the answerable portfolio defined, Phase 3 begins the iterative cycle of structural causal model development. The cycle is not a sequence to be completed; it is a loop designed to terminate at any turn with a defensible artifact.
One turn of the cycle:
- Draw the initial DAG from the Phase 1 expert input. Every node is a variable; every edge is a causal mechanism the experts assert. Unmeasured variables appear as exogenous noise nodes. This is a written statement of clinical, actuarial, or process knowledge — data refines it but cannot overturn it.
- Select the highest-priority estimand from the Phase 2 portfolio. Translate it into a precise causal target: ATE, CATE, counterfactual, transport.
- Check identifiability. Apply backdoor / frontdoor / do-calculus to determine whether the estimand can be expressed as a function of the available data. If not, document this and escalate to instrumental variables, mechanistic constraints, or — if neither helps — return to Phase 1 with a specific data-acquisition request.
- Estimate with an estimator matched to the data and the estimand. Outcome regression with backdoor adjustment, marginal structural models for time-varying confounding, mediation analysis for effect decomposition.
- Run sensitivity analysis. Quantify robustness to the structural assumptions the DAG encodes. Rosenbaum bounds and E-values for unmeasured confounding; negative-control outcomes for structural bias; refutation tests for placebo treatments.
- Present to stakeholders. The deliverable is the DAG, the estimand, the point estimate, the sensitivity bounds, and the explicit assumptions. Each is defensible to a governance committee on its own.
- Revise the DAG from stakeholder feedback. Stakeholders frequently identify missing edges or implausible assumptions only after seeing the model's output. Their feedback is data. Incorporate it and start the next turn.
Each turn produces a complete artifact. The engagement can terminate after one turn — and often should, when the first turn's answer is decisive — or continue for several, refining the structural model as the portfolio's questions are addressed in priority order.
For technical depth on the individual steps, see the companion pages on Pearl's ladder of causation, structural causal models, the three-step counterfactual procedure, identification, and sensitivity analysis.
Artifacts and timing
Each phase produces a committable deliverable. The engagement can stop after any phase with a useful product.
| Phase | Deliverable | Typical timing |
|---|---|---|
| Phase 1 Audit |
Evidence inventory: data sources cataloged with coverage, biases, and supported claim types. SME interview summaries with elicited priors and structural beliefs. | Days, not weeks. A half-day workshop seeds it; one to three weeks completes it depending on data complexity. |
| Phase 2 Categorize |
Question portfolio: each stakeholder question classified by rung and cross-checked against the inventory. Answerable subset identified. Data-acquisition needs documented for the rest. | Days. Usually one to two stakeholder workshops plus a written portfolio. |
| Phase 3 SCM cycle |
Per turn of the cycle: a documented DAG, one identifiable estimand, a point estimate with sensitivity bounds, and a presentation of the result with explicit assumptions. | Weeks per turn. Most engagements complete two to four turns before the answerable portfolio is exhausted. |
Phases compound. The evidence inventory from Phase 1 makes Phase 2's categorization fast. Phase 2's portfolio makes Phase 3's first DAG narrow enough to be drawable. Phase 3's first turn produces feedback that often expands the inventory in Phase 1 of the next engagement.
Most engagements begin with a half-day discovery workshop. The workshop completes Phase 1 for one business unit and seeds Phase 2 — producing a written evidence inventory and a first-draft question portfolio your team can carry forward independently.