Statistical evidence in criminal trials suffers from a structural problem: the expert answers a question about populations, and the law asks a question about this defendant, this act, this harm. The standard analysis computes P(evidence | guilt) — the probability of observing this evidence if the defendant were guilty. The law requires P(guilt | evidence) — the probability the defendant is guilty given this evidence. These are not the same quantity, and the conversion between them requires a prior probability that the expert must have assumed, implicitly or explicitly. Closing that gap requires moving from Rung 1 — what does the data show? — to Rung 3 — what would have happened if this defendant had acted differently? That is not a statistical question. It is a causal one.

Analysis ComponentStandard ApproachCausal Approach
Probability directionP(evidence | guilt) — what the expert computesP(guilt | evidence) — what the law requires
Individual vs. populationRates across a reference population of casesThis defendant, this incident, this outcome
Causation standardCorrelation — presence of DNA, pattern of arrestBut-for counterfactual — would harm have occurred without the act?
ConfoundersUnmodeled — policing intensity treated as signalExplicitly separated using the do() operator
No machine running on observational data alone can derive an answer to a counterfactual question.
— Judea Pearl, Causality: Models, Reasoning, and Inference, 2009
Dialog: But for the defendant's act, would the harm have occurred? — answered: this is a causal question. Most statistical experts answer 'how likely is this evidence under guilt?' — a different question entirely.
3 Questions, 3 Rungs
  1. Would this harm have occurred but for the defendant’s act? — Rung 3 (Counterfactual). The but-for test requires holding all actual background conditions fixed and computing the consequence of a changed act. No statistical model can do this without a causal structure; Situational Risk Context is the confounder whose back-door must be closed by abduction before the counterfactual is applied.
  2. If we intervened to change this defendant’s arrest record, would the risk score change for the right reasons? — Rung 2 (Intervention). do(Prior Arrests) severs the upstream causes of arrest — policing intensity and socioeconomic stress — isolating only the direct effect. The gap between obs() and do() is the confounder the actuarial tool cannot see.
  3. What does the DNA match actually tell us about guilt? — Rung 1 (Association). The graph encodes P(guilt | evidence), which is what the law requires. The prosecutor’s figure is P(evidence | innocence), which is what the expert computed. Reading the correct posterior from the graph makes the inversion precise and cross-examinable.

Reading the screenshots: a black check mark on a node means it has been set as observed evidence — a fact entered into the model, acting as a filter. A red check mark means it has been set as a do intervention — a decision applied to the model, severing the influence of its parents.

Reading the spec tables: each Run the Analysis block lists the exact steps to reproduce each screenshot in Bayes Server. The Obs / Do column uses three italic control tokens: clear — reset the model to a blank no-evidence state; abduction step — enter the factual observations that anchor the U nodes to this specific case; use abduction result — apply a do() intervention with the U nodes held from the abduction step.

Rung 3 — Counterfactual

But For Is Rung 3

“Would this harm have occurred but for the defendant’s act?”

Criminal law’s but-for standard is a counterfactual question: hold all background conditions at their actual values, change the defendant’s act, and compute the consequence. Rung 2 gives the average effect of changing an act across all cases like this one. Courts ask a narrower question: given everything that was actually true in this incident, what changes if we change just this one decision? That is Rung 3, and it requires a causal structure to answer.

The model adds a confounder — Situational Risk Context — that independently makes culpable acts more likely and independently elevates harm probability through mechanisms unrelated to the defendant’s choices. This creates a back-door path the observational query cannot close. The three-step counterfactual procedure closes it: abduction anchors the background risk to this incident, then the intervention changes only the defendant’s act while holding that background fixed.

Answer

Abduction: obs(Defendant’s Act = At Fault) + obs(Harm = Occurred) + obs(ISC = Absent) — But-For Cause = 92% Yes. The background of this specific incident is now anchored in the U nodes; SRC updates to High 71.8%. Counterfactual: release Harm Outcome, then do(Defendant’s Act = Not At Fault) with U posteriors held. P(Harm = Occurred) drops to 16.1% — the residual hazard from the elevated situational background that exists independently of the defendant’s choices. For comparison, obs(Act = Not At Fault) from the same abducted state gives 13.9%, because obs() allows SRC to infer a lower-risk background — the 2-point gap is the confounding correction. The ISC test from the same abducted state: switching ISC to Present collapses But-For to 70% No, confirming the causal chain breaks when an independent sufficient cause is present.

CriminalCausation-AllRungs.bayes
ImageObs / DoNodeSet
Prior
but-for-priorclearBut-For Cause: No 68.7% / Yes 31.3%
Abduction — anchor the U nodes to this specific incident
but-for-abductionclear
obsDefendant's ActAt Fault
obsHarm OutcomeOccurred
obsIndependent Sufficient CauseAbsent
But-For CauseNo 8% / Yes 92%
Counterfactual — change one thing, hold U nodes, read Harm Outcome
but-for-doabduction step
doDefendant's ActNot At Fault
Harm OutcomeOccurred 16.1%
but-for-obsabduction step
obsDefendant's ActNot At Fault
Harm OutcomeOccurred 13.9%
but-for-isc-presentabduction step
obsIndependent Sufficient CausePresent
But-For CauseNo 70% / Yes 30%
But-For Causation — prior, no evidence set
Abduction — actual incident anchored

obs(Act = At Fault) + obs(Harm = Occurred) + obs(ISC = Absent). Situational Risk Context updates toward High. U_Act and U_Harm posteriors anchor the background of this specific incident. But-For Cause: 92% Yes.

Cross-Examination Question

“Your model is based on historical data from a population of cases. When you say this defendant posed a high risk — was that a statement about this defendant specifically, or about defendants who share certain characteristics with this defendant? These are different things. Did your model compute what would have happened to this defendant if the intervention you’re recommending had been applied to him ten years ago?”

No regression model or actuarial risk tool can answer this. The question isolates exactly what the model cannot do — and what the legal standard requires.

Rung 2 — Intervention

The Confounder in the Witness Box

“If we intervened to change this defendant’s arrest record, would the risk score change for the right reasons?”

The problem with observational data is that policing intensity and socioeconomic stress both drive prior arrests, so any regression conflates what arrests signal with why they happen. The model separates them using the do() operator: observing an arrest updates its upstream causes because it gives us information about them; setting an arrest by intervention severs that connection entirely, leaving the risk score to rise in the absence of any information about what drove the arrest.

Answer

Under do(), Actual Reoffending Risk rises to 40.8%. Under obs(), it stays at 27.9% — unchanged from the prior. The gap is the confounder. When the model observes prior arrests, it cannot update the parents — so Socioeconomic Stress and Policing Intensity stay at their priors and ARR stays flat. When the model intervenes with do(), those parent links are severed and the model updates SES and PI upstream to explain the elevated arrest rate — ARR rises because SES rises. The risk score fires in both cases (High 90%), but in the obs() world it is measuring arrests, not risk. The 13-point gap makes the confounding correction precise and cross-examinable.

CriminalCausation-AllRungs.bayes
ImageObs / DoNodeSet
confounder-priorclearARR High 27.9% / RAS High 35.7%
confounder-obsclear
obsPrior ArrestsYes
Risk Assessment ScoreHigh 90%
Actual Reoffending RiskHigh 27.9% — unchanged
confounder-doclear
doPrior ArrestsYes
Risk Assessment ScoreHigh 90%
Actual Reoffending RiskHigh 40.8% — SES updates upstream
Expert Witness Confounder — prior, no evidence set, Actual Reoffending Risk 27.9% High
obs(Prior Arrests = Yes) — evidence entered

Observing an arrest updates both socioeconomic stress and policing intensity. The graph encodes which dependencies exist — and they flow in both directions.

Cross-Examination Question

“Your model uses prior arrest record as a predictor. Did your model account for differences in policing intensity across neighborhoods when computing that score? If two defendants had identical actual behavior but lived in areas with different arrest rates, would your model produce different risk scores for them? Have you tested that? What is the effect size?”

The expert either acknowledges the confounding problem, or claims the model accounts for it — in which case the follow-up is to ask for the mechanism. No standard actuarial tool has this mechanism.

Rung 1 — Association

The Prosecutor’s Fallacy

“What does the DNA match actually tell us about guilt?”

At Rung 1 the model is running as a filter: enter what you know, read what updates. The graph encodes which dependencies exist — Actual Guilt drives the DNA Test Result — so setting DNA Test Result = Match updates the probability of Actual Guilt through that connection. This is where almost all prosecution experts stop, and where the fallacy lives: the expert reports P(match | innocent), and the jury hears P(innocent | match). Bayes’ theorem shows these are radically different quantities.

Answer

The correct posterior on Actual Guilt after a DNA match is 9.1% — not 99%. With a prior of 1 in 1,000 (Actual Guilt = 0.1%) and a false positive rate of 1%, a match updates guilt to 9.1% via Bayes’ theorem. The Prosecutor’s Inference node encodes the inverted figure: 100% Likely Guilty. The two nodes sit side by side — 9.1% and 100% — and the cross-examination question writes itself: what prior probability of guilt does your figure assume? The answer is in the Actual Guilt node.

CriminalCausation-AllRungs.bayes
ImageObs / DoNodeSet
prosecutors-fallacy-priorclearActual Guilt 0.1% / Prosecutor’s Inference 1.1% Likely Guilty
prosecutors-fallacy-matchclear
obsDNA Test ResultMatch
Actual GuiltGuilty 9.1%
Prosecutor’s Inference100% Likely Guilty
Prosecutor's Fallacy model at prior — Actual Guilt 0.1%, DNA Test Result uninformed
Prior — no evidence set

One suspect in a database of 1,000. Before any DNA evidence, the prior probability of guilt is 0.1%.

Cross-Examination Question

“Doctor, when you say the probability of a random match is 1 in 10,000 — that is the probability that an innocent person would match the profile, is that right? That is not the same as the probability that this defendant is innocent given the match. Did you calculate the latter? What prior probability of guilt did you use in that calculation?”

Most experts have not done this calculation. The question forces them to either acknowledge the gap or attempt a Bayesian calculation on the stand — which will reveal the base rate they implicitly assumed.

Question 1 — Which probability did you compute?

Force the expert to distinguish between:

  • P(evidence | defendant is guilty) — what most experts compute
  • P(defendant is guilty | evidence) — what the law needs

These are related by Bayes’ theorem, and the conversion requires a prior probability that the expert must have assumed — implicitly or explicitly. Ask them to state it.

Question 2 — Did your model answer a population question or an individual one?

Statistical models produce answers about distributions. The law asks about individuals. Ask the expert:

  • Was your conclusion about defendants with characteristics similar to this defendant, or about this defendant?
  • If we intervened to change the factor you identified — not observe it changing, but actually change it — what would happen to this outcome for this person?
  • Does your model have a mechanism for that calculation?

Question 3 — What variables did your model not include?

Every statistical model omits variables. Ask the expert to identify the three most important variables their model did not include. Then ask:

  • Could any of those omitted variables be correlated with both the predictor and the outcome?
  • If so, what is the direction of the bias that omission introduces?
  • Has your model been validated in populations with the same distribution of those omitted variables as this defendant?
What a Defensible Causal Opinion Looks Like

An expert who has built a structural causal model can answer all three questions. They can show the causal graph, identify which variables are confounders and which are mediators, demonstrate what the counterfactual calculation produces for this specific defendant, and show the sensitivity of their conclusion to the prior probabilities they assumed.

That is not what most forensic statistical experts provide. The gap is not a question of degree — it is a structural gap between two different categories of reasoning.

A single file with three disconnected subgraphs — one per rung. Download, open in Bayes Server, and follow the Run the Analysis steps on this page. The model does the calculation; you observe where the logic breaks.

CriminalCausation-AllRungs.bayes — Rungs 1, 2 & 3
15 nodes across three subgraphs. Rung 1: obs(DNA Match) → Actual Guilt 9.1% vs Prosecutor’s Inference 100% Likely Guilty — the gap is one unstated prior. Rung 2: obs(Prior Arrests = Yes) → Actual Reoffending Risk 41.9%; do(Prior Arrests = Yes) → 27.9% — the 14-point gap is the confounder. Rung 3: abduction anchors U_Act and U_Harm to this incident; do(Not At Fault) gives P(Harm) 24%; the 10-point gap vs obs(Not At Fault) at 14% is the confounding correction the but-for standard requires.

All models require Bayes Server (free edition available). See Download Models for the full library.

Next Step

The prosecution’s expert will be cross-examined for thirty minutes. The structural problems in their analysis will have taken years to become embedded in the field. The cross-examination questions are available to any lawyer who understands what the models cannot do.

The models are free. What I provide is the judgment to build the right structure for your specific situation, encode your experts’ knowledge into it, and turn the output into decisions your court can act on. The discipline stays with your team.

info@rung3.ai

This case is a composite drawn from published case law, academic literature on forensic statistics, and algorithmic risk assessment research. No individual case or engagement is described. The Bayes Server models are working files: download, set evidence, and run inference.