How Causal Models Compose

Two Structural Causal Models (SCMs) that look composable can both be internally valid and produce silent nonsense when joined. The composition primitives are the artifacts that turn unspoken assumptions about how models connect into checkable claims — and that force the LLM mediator to declare what it would otherwise paper over.

The argument for the Library

This page covers one component of the Library. For the case for the Library itself — why SCMs, not ML or LLMs, are the framework that justifies a library — see Why Structured Causal Models?

The Composition Problem

Consider two models, each correct in its own frame:

Model A — Labor supply. Predicts hours worked from wage, education, and household composition. Variable: income, defined as monthly post-tax household income.

Model B — Health outcomes. Predicts mortality from income, age, and lifestyle. Variable: income, defined as annual pre-tax individual income.

A user asks: "How would expanded employment affect population mortality?" The LLM finds both models in the library, sees they share an income node, joins them at that node, and produces an answer. The answer reads fluently. It is also wrong — not by a small fraction, but in ways that would change a policy recommendation.

Three things differ between the two incomes: time grain (monthly versus annual), tax basis (post-tax versus pre-tax), and unit of aggregation (household versus individual). Each is a separate translation problem. None of them is recoverable from the surface label "income."

The failure mode

Both models pass internal validation. Both compose into a graph that looks well-formed. The LLM produces an answer the user can act on. The library has no idea anything went wrong.

Three operations the word "compose" hides

This is one face of a more general problem. The verb compose conflates three distinct technical operations, each of which fails differently:

Operation	What it asks	How it fails silently
Referent alignment	Do these two nodes denote the same real-world quantity?	Same name, different referent. Income/income, risk/risk, exposure/exposure.
Graph union	Is the merged DAG well-formed and acyclic?	One model's effect direction is the other's cause direction. The cycle hides because each parent was acyclic.
Assumption reconciliation	Does the joined model's identification status follow from the parents'?	Identifiability is treated as inheritable. It isn't. New backdoors open through the seam.

Most composition failures conflate these. Engineers think they are doing graph union; what they are actually doing is referent alignment by hope and assumption reconciliation by inheritance.

Four Primitives

The primitives are not a complete solution to composition. They are the minimum vocabulary that makes composition checkable rather than fluent. Each one converts a category of silent assumption into a machine-readable artifact that the user — or another model in the library — can challenge.

1. Referent declaration

Every variable in every model carries a structured referent, not just a name. The referent specifies — at minimum — unit, basis, population, and time grain, along with a pointer to the empirical distribution from which the variable's prior was drawn.

When two referents differ, the library detects the mismatch automatically. Composition does not proceed until the mismatch is acknowledged. The point of the referent is not pedantic precision; it is to make naming unable to launder ambiguity.

2. Bridge nodes

When two referents differ but the underlying quantity is in principle convertible, a bridge node is inserted between them. The bridge is itself a small SCM: it takes the source referent as input, applies an explicit transformation, and outputs the target referent.

The bridge has its own assumptions and its own scope conditions. A bridge from monthly post-tax household income to annual pre-tax individual income requires assumptions about household composition (a four-person household is not four times a one-person household) and tax incidence (the marginal tax rate is not a constant). Each assumption is declared. Each is checkable. Each can fail.

The closest precedent is the input/output node of object-oriented Bayesian networks (Koller & Pfeffer, 1997) and the related fragment-composition framework of Laskey & Mahoney (1997). Bridge nodes generalize that idea: they carry their own assumption sets and scope conditions rather than serving as pure passthrough connectors.

The library's job is not to prevent you from joining models. It is to make the joining honest.

3. Assumption pushforward

When two SCMs are joined, the new graph's identifiability has to be recomputed from scratch. Inheriting the parents' identification claims is the most common bug in composition.

Suppose Model A has a clean identification path for P(Y | do(X)), and Model B has one for P(Z | Y). The joined model lets the user ask P(Z | do(X)) — but the path through Y might pass through a node that is a confounder in one parent's frame and a collider in the other's, or might open a backdoor that neither parent had reason to address.

Identifiability does not compose. It has to be re-derived in the joined graph. Pushforward is the primitive that forces this re-derivation rather than letting it be assumed. The data-fusion literature (Bareinboim & Pearl, 2016) develops the rigorous theory; the primitive here is the operational form a library has to expose to its users.

4. Scope intersection

The joined model is valid only where both parents' scope conditions hold simultaneously. The library computes this intersection and reports it before producing an answer. The technical term for the underlying problem in causal inference is transportability (Pearl & Bareinboim, 2014); scope intersection is the practical operation a library performs to enforce it.

The intersection can be empty. If Model A is valid for working-age adults in formal employment 2010–2020 and Model B is valid for retirees across multiple countries 1990–present, the intersection contains no one. The library should say so, rather than produce a number.

The intersection is often much narrower than either parent. A reader can then decide whether the narrowed scope still answers their question. They cannot make that decision if the library does not tell them what was narrowed and where.

The library's job is not to prevent you from joining models. It is to make the joining honest.

A Worked Example

Apply the primitives to the failure case from §1. The user wants to ask: how would expanded employment affect population mortality? The naive join produced a fluent wrong answer in milliseconds. The principled join takes longer, narrows the question, and produces something the user can act on.

Referent declaration catches the mismatch

The two income nodes carry structured referents. They disagree on three dimensions. Composition is paused.

Model A · income

unit: USD/month

basis: post-tax

population: household

time-grain: monthly

Model B · income

unit: USD/year

basis: pre-tax

population: individual

time-grain: annual

A bridge node is inserted

The bridge converts monthly post-tax household income into annual pre-tax individual income. It is itself a small model, with two declared assumptions: a household-composition rule (used to disaggregate household to individual) and a tax-incidence model (used to invert the post-tax basis). Both are scoped to the same population the joined model will eventually claim to apply to.

Both assumptions can be wrong. The library does not hide this; it surfaces it. A user who disagrees with either assumption can swap in an alternative bridge.

Assumption pushforward recomputes identification

Each parent query was identifiable in its own frame. The joined query P(mortality | do(employment_status)) is not identifiable for free. The pushforward surfaces a new backdoor: education predicts employment, and education also predicts mortality through health behaviors that neither parent model represented.

The library's response is to identify the missing adjustment set, flag the variables it would need, and either request them from a third model or report that the query is unidentifiable from currently available evidence. It does not silently produce a number.

Scope intersection narrows the valid query

Model A is valid for working-age adults in formal employment, US, 2010–2020. Model B is valid for adults aged 25–65 across the US, Canada, and the UK, 2005–present. The joined model is valid only at the intersection: working-age adults aged 25–65 in formal employment, US, 2010–2020.

The library reports this scope alongside any answer it produces. A user asking the policy question for a 2025 cohort, or for self-employed workers, or for a country outside the US, is told that the answer does not extend that far — and given a path to the missing models that would.

The naive join took milliseconds and produced a number. The principled join took longer and produced a smaller number, scoped, qualified, and traceable to four declared assumptions. Only one of the two answers is useful.

What This Asks of the Mediator

Composition is exactly the place where the LLM is most tempted to paper over mismatches with fluent language. When a user describes a situation, and the model retrieves an SCM with an income variable, the model can produce a confident explanation of why the model fits. The explanation will sound principled. It will not be principled. It will be fluent.

The composition primitives' real job is not data plumbing. It is converting silent assumptions in the LLM's surface fluency into machine-readable artifacts that the user can interrogate. Each primitive forces the mediator to declare something it would otherwise leave implicit:

Referents force the mediator to commit to a specific quantity, not a word.
Bridges force the mediator to declare the conversion model and its assumptions.
Pushforward forces the mediator to recompute identification in the joined graph, not assume it.
Scope intersection forces the mediator to report the narrowed valid range alongside any answer.

The primitives sit in the lineage of model cards (Mitchell et al., 2019) and dataset documentation more generally — checkable artifacts that travel with the model — but with one important difference. Model cards are passive: a reader consults them. Composition primitives are active: the library checks them, the mediator is forced to populate them, and downstream queries depend on them resolving.

The point of the primitives

Composition primitives are not infrastructure for joining models. They are infrastructure for forcing the LLM to be honest about what it is doing — converting fluent-sounding paraphrase into artifacts the rest of the library, and the user, can check.

What Gets Harder

The primitives address a category of failure. They do not exhaust the problem of composition. Three classes of difficulty resist the primitives — and pretending otherwise is its own dishonesty.

Cross-boundary structural assumptions

When two models are joined, the union DAG sometimes reveals causal paths that neither parent specified. The primitives can detect that the path exists and flag the missing adjustment set. They cannot supply the missing structural assumption itself. This is where domain experts are irreplaceable, and where the library has to know its place: it can frame the question for a human, but it cannot answer it.

Cyclic dependencies between models

Two models can each be acyclic on their own and produce a cycle when joined: Model A's output feeds Model B's input, which in turn feeds Model A's. The primitives detect the cycle. Resolving it requires either dynamic modeling — turning the cycle into a temporal sequence with explicit lags — or rejecting the join entirely. Neither is automatic, and neither is always available.

Identification under disagreement

Two models can specify conflicting graphs over shared variables. Model A says income → health. Model B says health → income, perhaps mediated by labor-force participation, perhaps by some shared driver that neither names. Both models may have empirical support; both may be valid in their original scope. The primitives detect the disagreement. They cannot adjudicate it. This is where causal inference becomes scientific argument, and the library's role is to surface the argument cleanly, not to resolve it pretending to be neutral.

Where the primitives end

The composition primitives convert silent failures into explicit ones. They do not convert hard problems into easy ones. A library that pretends otherwise is back to producing fluent wrong answers — just with more metadata attached.

Key Terms

Referent

A structured declaration of what a variable actually denotes — unit, basis, population, time grain, source distribution. Distinct from the variable's name. Naming alone cannot align two models.

Bridge Node

A small SCM inserted between two variables with different referents. Carries its own assumptions and scope. Makes a conversion that would otherwise be implicit explicit and checkable.

Assumption Pushforward

The recomputation of a query's identification status in the joined graph, rather than its inheritance from either parent. New seams open new backdoors; pushforward is the primitive that forces them to be addressed.

Scope Intersection

The set of conditions under which all parent models in a composition simultaneously apply. The valid range of any joined query. Often empty or much narrower than either parent.

Graph Union

The DAG-level merge of two or more SCMs at shared variables. Necessary for composition but not sufficient — referent alignment and assumption reconciliation are separate operations and fail differently.

Silent Failure

A composition error that produces a syntactically valid, fluent-sounding answer the library cannot detect. The category of error the primitives are designed to convert into explicit failures.

Next Step

Three of the four primitives lean on a concept that has not yet been developed: scope. What does it mean for a model to declare its frame — and what should that declaration look like in machine-readable form?

info@rung3.ai

References

Bareinboim, E. & Pearl, J. (2016). Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences 113(27): 7345–7352.

Pearl, J. & Bareinboim, E. (2014). External validity: From do-calculus to transportability across populations. Statistical Science 29(4): 579–595.

Koller, D. & Pfeffer, A. (1997). Object-oriented Bayesian networks. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI-97), pp. 302–313.

Laskey, K. B. & Mahoney, S. M. (1997). Network fragments: Representing knowledge for constructing probabilistic models. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence (UAI-97).

Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D. & Gebru, T. (2019). Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAccT '19). arXiv:1810.03993.

↑ Back to top