A causal model that runs cleanly is not a causal model that applies. The difference between "the math is correct" and "the question is in scope" is the difference between a library that helps and a library that confidently misleads — and only one of those checks is automatic. Scope declarations make the second one automatic too.

The argument for the Library

This page covers one component of the Library. For the case for the Library itself — why SCMs, not ML or LLMs, are the framework that justifies a library — see Why Structured Causal Models?

Consider a model the library might plausibly contain: a wage-elasticity model fit on local labor-market variation in low-wage retail and food-service sectors, using minimum-wage changes across US states 2005–2018 (the methodological lineage of Card and Krueger). The model identifies the causal effect of moderate minimum-wage increases on employment and prices. Internal validation: clean. Identification: holds under standard assumptions about parallel trends and limited spillover.

A user asks: what would happen to employment under a 50% national minimum-wage hike?

The library finds the model. The model has wage elasticity in its variable set. The query parameter is in the model's vocabulary. The library produces a number. The number reads as a clean policy estimate. It is also wrong — not because the math is wrong but because the regime the model implicitly assumes (small marginal hikes absorbed by employers without structural reorganization) does not extend to the magnitude being asked about. At a 50% national change, employers reorganize, automation thresholds get crossed, prices restructure, and general-equilibrium effects matter that a partial-equilibrium model cannot see. None of that was in the data the model was fit on.

The diagnostic

This is not a measurement error. It is not a parameter error. It is not a computational error. The model is correct on its own terms. The user's question is outside those terms — and the library, lacking explicit scope declarations, has no way to know.

Internal validation only checks the frame. It cannot tell you whether the frame applies. To detect out-of-scope errors, scope has to be declared as a first-class object in the library — and declared along enough dimensions that the things that actually go wrong can be caught. Which turns out to be more than population and time period.

A scope declaration is the structured statement of the frame within which a model's causal claims hold. It has four fields. Each is independent: a model can be in-scope on one and out-of-scope on another, and the failure modes do not look alike.

1. Population scope

The most familiar dimension. Who is the model about — what demographic, geographic, sectoral, and temporal cohort does it claim to apply to? The wage-elasticity model's population scope: working-age adults in formal employment in low-wage retail and food-service sectors, US states, 2005–2018.

Population scope is the easiest to declare and the easiest to check automatically. It is also the dimension most likely to produce false confidence — a model whose population scope checks out on the surface (US, working-age, retail) might still fail on regime, measurement, or identification scope without any of the easy checks firing.

2. Regime scope

The structural and institutional context under which the model's causal claims hold. Distinct from population because the same population under different regimes obeys different causal structures.

The wage-elasticity model assumes a regime in which wage hikes are local and small relative to the wage distribution; firms cannot exit the market at the relevant timescale; substitution between labor and capital is limited at the margin; product markets are not contemporaneously restructuring. None of those assumptions appears in the variable set. They are background conditions under which the structural equations were estimated.

Regime scope is what breaks under structural change even when the population is identical. A 50% national hike does not change who is in the labor force; it changes the regime in which their labor is priced. A library that does not separately track regime scope from population scope will silently extrapolate across regime changes. The transportability literature (Pearl & Bareinboim, 2014) formalizes the conditions under which causal effects estimated in one setting carry to another; regime scope is one of the dimensions along which transport can fail.

3. Measurement scope

What was actually measured, with what instrument, under what sampling frame, with what missingness mechanism. The dimension that breaks when "income" in production data means something operationally different from "income" in the training data.

This is the connection back to referent declarations. Where a referent declares the quantity a variable denotes — unit, basis, population, time grain — measurement scope declares the operational definition under which the model's training data was collected: the survey instrument, the response rate, the kinds of non-response that were systematic, the proxies used when the underlying quantity was unobservable. A model trained on payroll-reported wages cannot be applied without translation to a population whose wages are largely tipped or informal.

4. Identification scope

The most subtle field, and the one that makes scope a per-query property rather than a per-model property.

A model can be in-scope on population, regime, and measurement, and still produce a wrong answer because the specific query being asked is not identifiable from the model's structure. A model that cleanly identifies P(employment | wage) from observational data does not, on the same data, identify P(employment | do(wage)) without further assumptions. Different rungs of Pearl's ladder require different identification assumptions; the same model can be in-scope for one rung and out-of-scope for another over the same variables.

Identification scope tags each model with the queries it can answer. Rung-1 only? Rung-2 under specific assumptions? Rung-3 with a fully specified structural model? The library refuses to answer queries outside the model's identification scope — even when population, regime, and measurement all check out. This is the field that ties scope most directly to Pearl's ladder: it makes the rung of a query a scope variable, not just a methodological label.

Field Declares Silent failure Library check
Population Cohort: who, where, when Surface match obscures dimension mismatch Match query population against declared cohort
Regime Structural/institutional context Magnitude or kind of change exceeds assumed regime Compare query parameters against training distribution
Measurement Instrument, sampling frame, missingness Same name, different operational definition Cross-reference with referent declarations
Identification Queries the model can answer, per rung Right model, wrong question Resolve query against tagged identification class

Apply the four fields to the wage-elasticity case from §1. The model declares its scope along all four dimensions, in machine-readable form:

Population scope
cohort: working-age adults
employment: formal sector
sectors: retail, food service
geography: US states
period: 2005–2018
Regime scope
change kind: state-local minimum-wage
change magnitude: 5–25% above prior
market structure: stable; no exits/automation
spillover: limited across state borders
Measurement scope
wage source: CPS-MORG, hourly
employment source: QCEW, payroll
tipped wages: excluded from base
missingness: low for formal sector
Identification scope
rung-1: P(emp | wage) ✓
rung-2: P(emp | do(wage)) ✓ under DiD
rung-3: requires structural noise ✗

The user asks: what would happen to employment under a 50% national minimum-wage hike? The library checks all four fields before deciding whether to answer.

1

Population scope

Pass

Query target population is US working-age adults in formal employment. Matches the model's declared cohort. The library does not flag.

2

Regime scope

Fail

Query magnitude is a 50% national change. The model's declared regime covers state-local changes in the 5–25% range. The query exceeds the regime by both kind (local → national) and magnitude (25% → 50%). The library flags: regime scope violated.

The flag is not a refusal to think about the question — it is a refusal to silently answer it with this model. The library reports the violation, declines to produce a number, and offers two paths: refine the query into the model's regime, or seek a different model whose regime scope covers national-scale structural changes.

3

Measurement scope

Conditional

Suppose the user accepts the regime constraint and refines to a 15% local hike in a tipped sector. The library now checks measurement: the model's wage variable excludes tipped income from the base, while the user's sector reports wages with tips folded in. The library cross-references against the referent declarations from the Composition page and either inserts a bridge node or flags a measurement mismatch.

4

Identification scope

Per-query

Even with population, regime, and measurement aligned, the library re-checks at the level of the specific query. "Will this hike cause employment to fall?" is a Rung-2 question requiring P(emp | do(wage)). The model identifies this under difference-in-differences. If the user instead asked "would employment have fallen even without the hike?" — a Rung-3 counterfactual — the model is out-of-scope, and the library declines despite the other three fields passing.

The model was not broken. The first version of the user's question was outside the model's frame on the regime dimension; the second version was inside the frame but exposed a measurement mismatch; the third version surfaced a per-rung identification check the user did not know to ask about. At every stage the library either answered with the active scope reported, or declined with a specific reason. The naive library — the one without scope as a first-class object — would have produced four numbers, all fluent, all wrong in different ways.

Scope earns its place in the library only when it is more than metadata. A library that treats scope as documentation has not done the work; a library that treats scope as input to four operations has. Those operations are what turn declared scope into enforced scope.

Retrieval

A user query carries implicit scope: a target population, a time horizon, a regime context, a question type. The library matches query scope against candidate models' declared scopes during retrieval, not after. Models that are out-of-scope on the obvious dimensions never reach the candidate set. This is the cheapest scope check, and it is also the most underused — most retrieval systems match on topic, not on frame.

Composition

When two or more models are joined, the composed model's scope is the intersection of its parents' scopes. Scope intersection is one of the four composition primitives; this page gives it a foundation. The intersection can be empty, narrower than either parent, or further narrowed by identification incompatibilities introduced in the join. The data-fusion literature (Bareinboim & Pearl, 2016) develops the rigorous theory for fusing causal evidence across heterogeneous sources; the library's job is to expose the operational form. The library reports the resulting active scope before any composed query is answered.

Querying

Identification scope is per-query, so each ask is re-scoped at query time. A model in-scope on population, regime, and measurement can still be out-of-scope on identification for the specific question being asked. The library does not assume that a previously answered query of one rung implies anything about the next query of a different rung over the same model.

Reporting

The active scope is reported alongside any answer the library produces. Not in a footnote — inline, at the same level of prominence as the answer itself. A user reading the result can immediately see what was assumed, what was narrowed, and where the answer stops being valid. No scope is silently inherited; every answer carries its scope with it. The lineage here is the model-cards tradition (Mitchell et al., 2019), generalized from documentation that travels with a trained model to scope that the library actively enforces and reports per query.

Metadata vs. machinery

Scope declarations are the difference between metadata and machinery. A library that records scope but does not act on it has done documentation; a library that uses scope as input to retrieval, composition, querying, and reporting has done engineering. Only the second kind keeps the LLM mediator honest.

Scope, like every other library primitive, addresses a category of failure rather than the whole problem. Four failure modes lie outside what scope checks can catch on their own. Pretending otherwise is its own dishonesty.

Misdeclaration

A modeler claims wider scope than the model warrants. Every downstream check trusts the declaration; if the declaration is wrong, the checks pass and the answers are wrong. Misdeclaration is the hardest failure mode because it sits upstream of the library entirely. Mitigations — cross-validation against held-out populations, peer review, scope audits, scope-narrowing priors — exist, but none are automatic, and the library cannot detect misdeclaration without external evidence.

Contestation

Two domain experts disagree on whether a model applies to a given query. One reads the regime as continuous with the model's training distribution; the other reads it as a structural break. The library can surface the disagreement and present both views as competing scope claims. It cannot adjudicate. This is where scope becomes scientific argument, and the library's job is to make the argument visible — not to resolve it pretending to be neutral.

Decay

Scope shrinks as the world changes. A model's measurement scope assumed iOS app-attribution worked; iOS 14 broke that. A model's regime scope assumed a stable interest-rate environment; the regime changed. Scope-decay primitives — automatic detection of when a model's scope has narrowed since training, and revocation of in-scope claims that no longer hold — are not standard library infrastructure. They should be. They are not yet.

Gaming

The LLM mediator paraphrases the user's question to fit the model's scope, defeating the check. The user asks about a 50% hike; the mediator silently reformulates the query as "small marginal hike, scaled" so the regime check passes. Every other primitive on this page assumes the query reaches the library faithfully. The mediator stands between the user and the library. Whether it tells the truth at that interface is a different problem.

Where scope ends

The four failure modes above are not scope's fault. They mark the boundary at which scope as a primitive ends and other primitives begin — provenance and audit for misdeclaration and decay, scientific deliberation for contestation, and translation for gaming. The next page treats translation directly.

Population Scope
The cohort — demographic, sectoral, geographic, temporal — to which a model's claims apply. The most familiar dimension and the easiest to check; not sufficient on its own.
Regime Scope
The structural and institutional context under which the model's causal equations hold. Breaks under structural change even when the population is identical. The dimension most likely to be silently extrapolated.
Measurement Scope
The operational definitions under which the training data was collected — instrument, sampling frame, missingness, proxies. Distinct from referents (which name the quantity); paired with them in any composition.
Identification Scope
A per-query property: which queries, at which rungs of Pearl's ladder, the model can answer under its assumptions. The field that makes scope per-query rather than per-model.
Transportability
The technical term in the causal-inference literature for the conditions under which causal effects estimated in one setting carry to another. Scope intersection is a library's operational implementation of transportability across composed models.
Out-of-Scope Error
A wrong answer produced by an internally valid model applied outside its declared frame. The category of error scope declarations are designed to convert into refusals or narrowed answers.
Scope Decay
The narrowing of a model's effective scope as the world changes after training. Currently outside what scope primitives detect automatically — flagged here as future work.
Next Step

Scope is checkable in principle. But the LLM mediator stands between the user's natural-language question and the model's machine-readable scope — and can paraphrase the question to make any check pass. That is a translation problem, and it gets its own page.

info@rung3.ai

References

Pearl, J. & Bareinboim, E. (2014). External validity: From do-calculus to transportability across populations. Statistical Science 29(4): 579–595.

Bareinboim, E. & Pearl, J. (2016). Causal inference and the data-fusion problem. Proceedings of the National Academy of Sciences 113(27): 7345–7352.

Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D. & Gebru, T. (2019). Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAccT '19). arXiv:1810.03993.

↑ Back to top