An SCM library is only as useful as the granularity at which it lets the LLM compose. Choose the wrong unit of reuse and you get parameter hallucination at one extreme, brittle retrieval at the other. The library's value lives in what it lets the LLM compose — and in what it forces the user to declare.
This page covers one component of the Library. For the case for the Library itself — why SCMs, not ML or LLMs, are the framework that justifies a library — see Why Structured Causal Models?
The Framing
One workable framing of neurosymbolic AI: an LLM-mediated library of structured causal models. The LLM does the surface work — recognizing the situation a user describes, finding a model in the library that fits it, instantiating the variables, posing the query, and translating the answer back into natural language. The library does the structural work — supplying the causal graph, the parameters, the assumptions under which a query is identifiable, and the scope conditions under which the model applies at all.
That division of labor is an approximation. Real systems will blur it. But the approximation is useful because it isolates the question that actually matters in research: what should be in the library? Not the retrieval algorithm, not the prompting strategy, not the fine-tune. Those follow. The contents lead.
The unit-of-reuse problem
The library's usefulness lives or dies on how you carve the joints. Retrieval and composition both depend on it. Carve too fine and the LLM is left to assemble everything from primitives — which means hallucinating parameters and structure under the cover of a familiar shape. Carve too coarse and retrieval gets sparse and brittle, with named domain models that almost-but-don't-quite fit the situation in front of the user.
Retrieval needs units large enough to match against natural-language situations. Composition needs units small enough to recombine when no single match exists. These pressures pull opposite directions, and most of the practical pain of building such a library shows up in the gap between them.
Five Categories That Earn Their Keep
Roughly from abstract to concrete, these are the kinds of content I think pay rent in an SCM library. Each addresses a failure mode the others do not.
1. Reasoning gadgets
Small, domain-free causal subgraphs: the confounder triangle, fork, chain, collider, instrumental variable, front-door, M-bias, selection-on-outcome. These are the "design patterns" of causal inference. They have no parameters and no domain content — just structure. The LLM's job is recognizing when a situation instantiates one of them, even when surface vocabulary varies wildly.
2. Domain mechanism modules
Mid-sized, parameterizable templates that encode how something actually works: dose–response curves, SIR-style contagion, supply–demand with elasticity, exposure–mediator–outcome chains, principal–agent, queueing and bottleneck dynamics, failure-mode trees. These ship with empirical priors over their parameters and explicit scope conditions — the assumptions under which the mechanism is a reasonable approximation of reality.
3. Identification-tagged query types
For each model in the library, what is identifiable from observational data alone? What requires intervention? What requires additional assumptions — monotonicity, exclusion restrictions, ignorability? Without this metadata you get confident wrong answers, which is worse than no answer at all. Identification status is not a footnote on a model — it is the precondition for any of its outputs being trustworthy.
The fact that identification depends on the question being asked, not just the model, is what ties this directly to Pearl's ladder of causation:
Association
What is observed. Identifiable from data with the weakest assumptions; what most predictive ML lives on.
→ P(Y | X)
Intervention
What would happen if we acted. Requires structural assumptions about which variables are confounded; identifiable only when the graph permits it.
→ P(Y | do(X))
Counterfactual
What would have happened. Requires the strongest assumptions — typically a fully specified structural causal model with exogenous noise distributions.
→ P(Yx | X', Y')
A library that does not tag its queries by rung will silently let users ask Rung-3 questions of Rung-1 evidence. The output will look fluent. It will also be wrong.
4. Composition primitives
Types and rules for stitching SCMs together. Two models referring to "income" need either an agreed referent or an explicit reconciliation step. A model whose output is "monthly mortality rate" cannot simply be plugged into another model expecting "annual deaths" without unit alignment. Most academic SCM libraries underinvest here, and it is where I would guess most of the practical pain shows up — composition is where ambiguity becomes error. The dedicated page on composition primitives develops this in detail.
5. Scope and validity declarations
When does this model apply? What violates its assumptions? What is outside its frame entirely? This is the difference between a library that helps and a library that confidently misleads. Scope declarations are the negative space of the model — what it explicitly does not claim — and they have to be machine-readable, not buried in a paragraph of prose. The dedicated page on scope and validity develops the four fields a scope declaration has to contain.
Two Design Pressures
Two considerations are easy to underweight when designing such a library. Both shape the answers to the unit-of-reuse question — and neither is fully solved by adding more content.
Pressure 1 — The LLM-mediator does heavier lifting than it looks
Translating a natural-language situation into a variable assignment and a graph instantiation is not a benign step. It has adversarial failure modes. The model may pick a structure that fits the surface narrative the user gave it, while quietly omitting a confounder the user did not mention — because the user did not know to mention it. The result is a perfectly composed answer to the wrong causal question.
Some of the library's value, then, has to come from forcing explicit declaration of exclusions and scope, not just supplying structures. Every model needs to ask, in machine-readable form: what are the variables you are claiming are not relevant here, and why? The dedicated page on the translation problem develops the primitives that force the LLM mediator to declare what it would otherwise paper over.
The LLM names a structure that matches the user's framing of the problem. The framing omits a confounder. The library has no mechanism to challenge the framing. The output is fluent, internally consistent, and wrong.
Pressure 2 — The abstraction-level question is genuinely tricky
Pure gadgets push too much work downstream and invite parameter hallucination. Named domain models get sparse and brittle in retrieval. Neither extreme is the answer. The interesting design space is layered: abstract templates that get instantiated through narrower modules, with provenance tracked through the chain so the eventual output can be audited back to its assumptions. The page on provenance and audit develops the durable form of this tracking.
Too abstract
- Library is mostly gadgets — fork, chain, collider
- LLM has to invent parameters from priors
- Retrieval is broad but composition is hallucinatory
- Plausible-looking outputs, no anchoring to empirical reality
Layered well
- Gadgets supply structure; modules supply parameters
- Provenance is preserved through composition
- Identification status is computable, not asserted
- Scope conditions surface before the answer does
Key Terms
This page is the typology. The methods arc that follows works it through to operational form: composition, scope, translation, and audit. The arc closes on a vision page — the work the library cannot do — that names what these methods are ultimately preparing material for: people who inhabit a discipline and do the work disciplines have always done, with better materials than they have ever had.
info@rung3.ai