An SCM library is only as useful as the granularity at which it lets the LLM compose. Choose the wrong unit of reuse and you get parameter hallucination at one extreme, brittle retrieval at the other. The library's value lives in what it lets the LLM compose — and in what it forces the user to declare.

The argument for the Library

This page covers one component of the Library. For the case for the Library itself — why SCMs, not ML or LLMs, are the framework that justifies a library — see Why Structured Causal Models?

One workable framing of neurosymbolic AI: an LLM-mediated library of structured causal models. The LLM does the surface work — recognizing the situation a user describes, finding a model in the library that fits it, instantiating the variables, posing the query, and translating the answer back into natural language. The library does the structural work — supplying the causal graph, the parameters, the assumptions under which a query is identifiable, and the scope conditions under which the model applies at all.

That division of labor is an approximation. Real systems will blur it. But the approximation is useful because it isolates the question that actually matters in research: what should be in the library? Not the retrieval algorithm, not the prompting strategy, not the fine-tune. Those follow. The contents lead.

The unit-of-reuse problem

The library's usefulness lives or dies on how you carve the joints. Retrieval and composition both depend on it. Carve too fine and the LLM is left to assemble everything from primitives — which means hallucinating parameters and structure under the cover of a familiar shape. Carve too coarse and retrieval gets sparse and brittle, with named domain models that almost-but-don't-quite fit the situation in front of the user.

Why the unit matters

Retrieval needs units large enough to match against natural-language situations. Composition needs units small enough to recombine when no single match exists. These pressures pull opposite directions, and most of the practical pain of building such a library shows up in the gap between them.

Roughly from abstract to concrete, these are the kinds of content I think pay rent in an SCM library. Each addresses a failure mode the others do not.

1. Reasoning gadgets

Small, domain-free causal subgraphs: the confounder triangle, fork, chain, collider, instrumental variable, front-door, M-bias, selection-on-outcome. These are the "design patterns" of causal inference. They have no parameters and no domain content — just structure. The LLM's job is recognizing when a situation instantiates one of them, even when surface vocabulary varies wildly.

2. Domain mechanism modules

Mid-sized, parameterizable templates that encode how something actually works: dose–response curves, SIR-style contagion, supply–demand with elasticity, exposure–mediator–outcome chains, principal–agent, queueing and bottleneck dynamics, failure-mode trees. These ship with empirical priors over their parameters and explicit scope conditions — the assumptions under which the mechanism is a reasonable approximation of reality.

3. Identification-tagged query types

For each model in the library, what is identifiable from observational data alone? What requires intervention? What requires additional assumptions — monotonicity, exclusion restrictions, ignorability? Without this metadata you get confident wrong answers, which is worse than no answer at all. Identification status is not a footnote on a model — it is the precondition for any of its outputs being trustworthy.

The fact that identification depends on the question being asked, not just the model, is what ties this directly to Pearl's ladder of causation:

1

Association

What is observed. Identifiable from data with the weakest assumptions; what most predictive ML lives on.

→ P(Y | X)

2

Intervention

What would happen if we acted. Requires structural assumptions about which variables are confounded; identifiable only when the graph permits it.

→ P(Y | do(X))

3

Counterfactual

What would have happened. Requires the strongest assumptions — typically a fully specified structural causal model with exogenous noise distributions.

→ P(Yx | X', Y')

A library that does not tag its queries by rung will silently let users ask Rung-3 questions of Rung-1 evidence. The output will look fluent. It will also be wrong.

4. Composition primitives

Types and rules for stitching SCMs together. Two models referring to "income" need either an agreed referent or an explicit reconciliation step. A model whose output is "monthly mortality rate" cannot simply be plugged into another model expecting "annual deaths" without unit alignment. Most academic SCM libraries underinvest here, and it is where I would guess most of the practical pain shows up — composition is where ambiguity becomes error. The dedicated page on composition primitives develops this in detail.

5. Scope and validity declarations

When does this model apply? What violates its assumptions? What is outside its frame entirely? This is the difference between a library that helps and a library that confidently misleads. Scope declarations are the negative space of the model — what it explicitly does not claim — and they have to be machine-readable, not buried in a paragraph of prose. The dedicated page on scope and validity develops the four fields a scope declaration has to contain.

Without identification metadata you get confident wrong answers, which is worse than no answer at all.

Two considerations are easy to underweight when designing such a library. Both shape the answers to the unit-of-reuse question — and neither is fully solved by adding more content.

Pressure 1 — The LLM-mediator does heavier lifting than it looks

Translating a natural-language situation into a variable assignment and a graph instantiation is not a benign step. It has adversarial failure modes. The model may pick a structure that fits the surface narrative the user gave it, while quietly omitting a confounder the user did not mention — because the user did not know to mention it. The result is a perfectly composed answer to the wrong causal question.

Some of the library's value, then, has to come from forcing explicit declaration of exclusions and scope, not just supplying structures. Every model needs to ask, in machine-readable form: what are the variables you are claiming are not relevant here, and why? The dedicated page on the translation problem develops the primitives that force the LLM mediator to declare what it would otherwise paper over.

Failure mode

The LLM names a structure that matches the user's framing of the problem. The framing omits a confounder. The library has no mechanism to challenge the framing. The output is fluent, internally consistent, and wrong.

Pressure 2 — The abstraction-level question is genuinely tricky

Pure gadgets push too much work downstream and invite parameter hallucination. Named domain models get sparse and brittle in retrieval. Neither extreme is the answer. The interesting design space is layered: abstract templates that get instantiated through narrower modules, with provenance tracked through the chain so the eventual output can be audited back to its assumptions. The page on provenance and audit develops the durable form of this tracking.

Too abstract

  • Library is mostly gadgets — fork, chain, collider
  • LLM has to invent parameters from priors
  • Retrieval is broad but composition is hallucinatory
  • Plausible-looking outputs, no anchoring to empirical reality

Layered well

  • Gadgets supply structure; modules supply parameters
  • Provenance is preserved through composition
  • Identification status is computable, not asserted
  • Scope conditions surface before the answer does
Structured Causal Model (SCM)
A formal object specifying a directed graph of causal relationships, structural equations linking variables, and a distribution over exogenous noise. Sufficient for answering questions at all three rungs of Pearl's ladder.
Reasoning Gadget
A small, domain-free causal subgraph — confounder triangle, instrumental variable, front-door, collider, M-bias. The "design patterns" of causal inference.
Identification
Whether a causal quantity can be computed from the available evidence and assumptions. Different rungs require different identification conditions; not every model–query pair is identifiable.
Composition Primitive
The types and rules that govern how two models can be stitched together — variable alignment, unit reconciliation, assumption compatibility checks.
Scope Condition
A machine-readable declaration of when a model applies, what assumptions it relies on, and what lies outside its frame. The negative space of the model.
Unit of Reuse
The granularity at which library contents are stored and retrieved. The central design choice — too fine invites parameter hallucination, too coarse produces brittle retrieval.
Next Step

This page is the typology. The methods arc that follows works it through to operational form: composition, scope, translation, and audit. The arc closes on a vision page — the work the library cannot do — that names what these methods are ultimately preparing material for: people who inhabit a discipline and do the work disciplines have always done, with better materials than they have ever had.

info@rung3.ai

↑ Back to top