Reconciling Expert Parameters

Three sources of numbers

In the linear Gaussian convention, edge weights start fixed at one. The work of parameter elicitation is about the parent node distributions — the mean and variance of each variable in the model. For our high-school curriculum, that means estimating distributions like “Algebra II performance in this cohort” rather than edge coefficients like “the effect of Algebra II on Algebraic Fluency.” The Pearl tradition gives three sources for those distributions, and they should agree.

Historical data. Compute the mean and variance of each variable from cohort transcripts and assessment scores. The estimates come with standard errors that scale with cohort size.
Expert elicitation. Ask each expert for a mean and a standard deviation. For a typical cohort in this school, what is the mean Algebra II performance, and how much spread do you expect around that mean? Point estimates without uncertainty understate the disagreement; full distributions make it legible. Three experts give three distributions per parent node.
External literature. Where comparable populations have been studied, published distribution estimates apply — subject to a transportability check on whether the populations are exchangeable in the relevant respects.

Each source gives a distribution for Algebra II performance — mean and ± one standard deviation. Data’s interval (0.24–0.40) does not overlap any expert’s interval. The experts overlap each other. The gap is structural, not noise.

The reconciliation procedure

The example above is the case worth attending to. The three experts’ intervals overlap each other; the data’s interval doesn’t overlap any of theirs. The temptation is to apply Bayesian updating directly — treat the experts as a prior, treat the data as a likelihood, compute a posterior. The Pearl tradition recommends investigating the gap first: data and expert estimates are answering different questions, and they should converge only when the DAG is correct.

Data answers: in the observed cohorts, what was the empirical distribution of Algebra II performance? Expert elicitation answers: what is the distribution of Algebra II performance for the cohort the model is meant to support? These are the same distribution only under the assumption that the observed cohorts are exchangeable with the target cohort — an assumption the DAG encodes structurally. When they diverge, that assumption is the first thing to check.

Three diagnoses cover most of these gaps.

Selection on an unmodelled variable. The cohorts in the data were selected (deliberately or accidentally) on a variable not in the DAG — self-regulation, say. The observed distribution of Algebra II in those cohorts is then shifted relative to the broader population the experts are reasoning about. Adding the selection variable to the DAG and re-fitting reconciles the distributions. This is the productive case.
A definition mismatch. The experts and the data are referring to different operationalisations of the same node. Data measures Algebra II performance as a transcript GPA; experts are thinking of standardized assessment scores. Resolving the mismatch — making both sources estimate the same operational variable — reconciles the distributions.
Population difference. The data was drawn from cohorts unlike the one the experts are reasoning about — a different school, an earlier curriculum, a different admissions selection. Transportability assumptions about whether the historical distribution applies to the target cohort need to be re-examined explicitly.

For disagreements about distribution shape — one expert says the cohort is bimodal, the others say unimodal; data shows heavy tails, experts assume Gaussian — escalate immediately to structure. Shape disagreement almost always means the experts hold different mental models of the population the node represents. The graph is wrong before the parameter is.

When Bayesian updating applies

The framework that does formally combine prior beliefs with new data is Bayesian updating. Each expert’s Gaussian becomes a prior over the parent’s mean and variance; the data’s likelihood updates it; the posterior is the principled compromise. It is the textbook answer to combining sources, and it has fifty years of formal development behind it.

The Pearl-tradition objection is not to Bayesian updating itself. It is to using it first. Bayes’ rule will happily produce a posterior whether or not the DAG is correct — and when the DAG is wrong, the posterior is a precise distribution nobody believes, because it averaged what should not have been averaged. The structural gap between data and experts is a more informative signal than any posterior summary of it.

The reconciled DAG with self-regulation added as a selection variable. Edges from self-regulation into Algebra II (the cohort selection) and Algebraic Fluency (the direct effect bypassing Algebra II) close the structural gap that was producing the disagreement about Algebra II’s distribution.

The same procedure, two stages. Before structural fix: data sits far left, experts cluster right, and the precision-weighted posterior is the precise distribution nobody believes. After: with self-regulation added to the DAG as a selection variable, the data’s observed Algebra II distribution shifts up (the cohort selection is now modeled) and the experts’ estimates shift down (they were implicitly imagining a less-selected population). The residual gap is small, the sources overlap, and Bayesian updating produces a posterior that sits inside the overlap — the legitimate compromise.

The right ordering is procedural.

Diagnose structural causes of the gap (confounder, mediator, transportability).
Fix the DAG.
Re-elicit expert distributions and re-fit the data against the corrected structure.
If a gap remains after structural fixes, then apply Bayesian updating to combine the surviving sources.

After the structural fix in the curriculum example — adding self-regulation as a confounder — the experts’ new distributions and the data’s new estimate converge to within elicitation noise. At that point, Bayesian pooling gives a posterior that everyone in the room is willing to defend, because the inputs are no longer disagreeing about what is being measured. The posterior is the right tool for residual disagreement — uncertainty that the structure has been pinned down enough to make legitimate.

Bayesian updating, in short, is the resolution mechanism. It is not the diagnostic.

What survives, what goes to the register

For the curriculum example, the Algebra II distribution disagreement traces to selection on an unmodelled variable. Self-regulation — previously on the disagreement register from the structural reconciliation — is added to the DAG, with edges into Algebra II performance and (through other paths) into the broader cohort selection. The data is re-fit conditional on the new structure; the experts re-elicit their estimates of the target cohort’s distribution. The estimates converge to within elicitation noise. The parent distribution enters the model as the Bayesian posterior over the reconciled inputs.

For the Chemistry/Physics direction question, the parametric reconciliation cannot resolve it either — both directions yield similar fit to the data, the experts split. It joins its structural twin on the disagreement register, with a note: a cohort taking Physics before Chemistry would distinguish the two directions; the school has not run that sequence.

The procedure’s output is the same shape as for structure: a parameterised DAG, and an expanded register. The register now carries both kinds of unresolved disagreement — about the graph, and about the numbers it carries.

See this procedure applied in engagements: the commercial auto reserving construction walkthrough uses the same machinery to reconcile two competing accounts of post-2020 loss-development drift — one from the chief actuary, one from the head of claims — into a single parameterised model. The FAIR cybersecurity risk construction walkthrough applies it to reconcile two estimates of reconnaissance-detection rate that turn out to be estimates of different conditional quantities — the structural insight that breaks the apparent disagreement.

Pearl, J., 2009, Causality: Models, Reasoning, and Inference (2nd ed.), Cambridge University Press — especially on faithfulness and the connection between conditional independence and parameter estimation · Hernán, M.A. & Robins, J.M., 2020, Causal Inference: What If, Chapman & Hall · O’Hagan, A. et al., 2006, Uncertain Judgments: Eliciting Experts’ Probabilities, Wiley — the practitioner-facing treatment of probability elicitation including consistency between data and expert estimates · Morgan, M.G. & Henrion, M., 1990, Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis, Cambridge University Press.

When data and experts disagree about a distribution.

On this page

Three sources of numbers

The reconciliation procedure

When Bayesian updating applies

What survives, what goes to the register