Composition and Scope both check what the library does with a query. Translation is what happens before that — and what happens before is where most of the actual error budget lives. The translation primitives let the library see, and refuse, what would otherwise be silent.

The argument for the Library

This page covers one component of the Library. For the case for the Library itself — why SCMs, not ML or LLMs, are the framework that justifies a library — see Why Structured Causal Models?

Every primitive on every prior page has assumed something this page will treat as suspect: that the query reaching the library faithfully represents the question the user asked. Composition checks for referent mismatches between models, but only between what was submitted as inputs to the join. Scope checks whether the query is in the model's frame, but only checks whichever query the mediator submitted. Both leave a gap at the front of the system: the user asks a question in natural language, the LLM mediator translates it into a structured library call, and the structured call is what gets checked. The translation itself is checked by no one.

Three failure modes live in that gap.

Frame shift

The mediator submits a query subtly different from the user's actual question. Different magnitude, different population, different time horizon. Every individual word in the mediator's structured call is plausible; the cumulative shift is real. The library answers a different question, fluently, and the user reads the answer as if it answered theirs.

Rung shift

The user asks a counterfactual; the mediator queries the library observationally. What would have happened if we hadn't raised the minimum wage? becomes what's the historical correlation between minimum-wage levels and employment? Both questions have answers. The answers are different. Pearl's ladder distinguishes the rungs (Pearl & Bareinboim, 2014); the user did not specify which rung; the mediator did not ask.

Scope laundering

The failure mode the Scope page named at the end of §5. The mediator silently reformulates the user's question to make scope checks pass. The user asks about a 50% national hike; the mediator submits "an aggregate of state-level increases." The library's regime-scope check passes because the structured call is in-frame. The natural-language question was not.

The diagnostic

These three failure modes are not addressable by writing better prompts. The user is naive about what got lost in translation — they don't know what to check for, and by construction, anyone who could fully audit the translation would not need the library. The library is the only entity in the system that has both the user's natural-language framing and the structured query, and so the library is the only place where the comparison can happen.

Same word as on the Composition page, on purpose. These are the library's basic operations at the translation interface, parallel to the composition primitives at the join interface. Each one forces the mediator to declare what it would otherwise leave implicit.

1. Query schema

The library does not accept natural-language strings as queries. It accepts a structured object with explicit slots: target population, intervention specification, outcome variable, query rung, time horizon, scope qualifiers. The schema is the artifact; the schema's slots are what the mediator must commit to. Strings are too soft to check; schemas are checkable.

This is the foundational primitive. Without it, the other three have nothing to operate on. With it, the rest follow.

2. Back-translation

Borrowed from human-translation practice: before producing an answer, the library has the mediator restate the structured query in natural language using the model's variable names and declared scope, and surfaces that restatement to the user. The user's job at this checkpoint is small but decisive: not to validate the answer — only to confirm the question. The library does not proceed without this acknowledgment.

Back-translation is rarely done in LLM systems. It is also the cheapest way to catch frame shift, because it transfers the comparison from the library (which doesn't know what the user meant) to the user (who does).

3. Frame-shift detection

The library compares the user's original natural-language framing against the structured query the mediator produced, looking for systematic shifts: magnitude rescaling, population narrowing, regime substitution, rung change. Some are mechanically detectable — magnitudes and units can be checked, rungs can be checked. Some require heuristics and a flag-rather-than-block default. The point is that the comparison happens at all. Most LLM systems treat the prompt as input and the structured call as output, and never compare them.

Frame-shift detection complements back-translation. Back-translation depends on the user catching the shift; frame-shift detection depends on the library catching it. Either alone is fragile; together they are durable.

4. Refusal-as-information

When the mediator cannot faithfully translate the user's question into the library's vocabulary — variables not in any model, query types outside any identification scope, scope conditions that no model declares — the library refuses, and the refusal itself is information: what variable is missing, what model would be needed, what assumption would have to hold. The library does not produce a fluent guess in place of a faithful translation. Refusal is a first-class output, not a fallback.

This is the primitive that breaks the otherwise-irresistible pull toward fluency. The LLM mediator is, by training and architecture, biased to produce some answer. The library has to be biased the other way — toward refusing, when refusal is the truthful response.

Primitive Forces the mediator to Catches
Query schema Commit to structured slots before submitting Soft / unstructured queries that hide assumptions
Back-translation Restate the query to the user before answering Frame shift, rung shift
Frame-shift detection Submit to comparison against the original Scope laundering, magnitude rescaling, regime substitution
Refusal-as-information Surface what is missing rather than substitute Fluent guesses produced in place of faithful translations

Continue the wage-elasticity case from the Scope page. The user originally asked about a 50% national minimum-wage hike; the library, applying scope checks, refused on regime-scope grounds and offered to refine the question or seek a different model. What happens next is where translation primitives earn their place.

The mediator, having seen the refusal, attempts to re-translate the user's question into a query that will pass. It does not consult the user. It submits something like this:

User's original question

"What would happen to employment under a 50% national minimum-wage hike?"

↓ mediator translates ↓
Mediator's structured query
population: US, working-age, formal employment
intervention: minimum-wage increase
magnitude: 12% (aggregated across state-level changes)
regime: state-level, decentralized
outcome: employment
rung: 2 — P(emp | do(wage))
↳ Frame shift detected: magnitude 50% → 12%; regime national → state-level

The structured call is in-frame. Without translation primitives, the library would answer it. With them, four checks fire in sequence.

1

Query schema

The mediator's submission is a populated schema, not a string. Every slot is filled; every commitment is explicit. This is the precondition for the next three checks. Without the schema, the comparison would have nothing to compare against.

2

Back-translation

The library renders the structured query in plain language, using the model's exact variable definitions: "What is the effect on US retail and food-service employment of an aggregated 12% increase in state-level minimum wages, where each state's increase is independent of the others?" The user reads this. The user's actual question was about a single national 50% hike. The substitution is now visible to the user, who can correct it.

3

Frame-shift detection

Independent of the user's response, the library compares the original framing ("50% national hike") against the structured query ("12% aggregated state-level"). Magnitude divergence: 50% versus 12%, mechanically flagged. Regime divergence: national versus state-level, substitution flagged. The library raises both shifts as machine-readable warnings — they sit alongside the back-translation, not after it.

4

Refusal-as-information

The library declines to answer the substituted query. The refusal includes the diagnosis: the user's original question requires a model whose regime scope covers national-scale, structural-magnitude changes; no such model is currently in the library. The user receives a refusal that names what is missing and what would be needed — not a fluent number that would have answered a different question.

Trace the arc of the wage example across the three methods pages. Page one (the naive library, before any primitives) gives the user a fluent number. The Composition page gives the library the ability to refuse on referent grounds. The Scope page extends the refusal to the model's frame. The Translation page closes the last gap: the user can now see, when the library refuses, what their own question was being substituted for at the interface — and why the substitution was the failure, not the model.

Translation is the part of the system most often treated as a prompt-engineering problem, with the implicit assumption that better prompts can solve it. The page's central claim is the opposite: translation cannot be solved on the user side or the mediator side, and so it has to be solved at the library interface — by the only entity that can see both ends.

Not the user side

Users come to the library to ask questions they don't already know the answers to. By construction, they cannot validate whether the translation preserved what they meant — the user has access to their own question and to the answer, but not to the structured query the mediator submitted, the model that was selected, or the assumptions that flowed through. Anyone who could fully audit the translation would not need the library at all. The audit cannot be the user's job.

What the user can do is small but important: confirm a back-translation, accept or reject a refusal, refine a question that has been flagged. The translation primitives are designed around what the user can plausibly do, not around hopes about what they could.

Not the mediator side

The LLM is fluent precisely about the kinds of substitutions that constitute frame shift. Self-checking against its own translation reuses the same cognitive surface that produced the shift in the first place. Better prompts can reduce the rate of shift; they cannot reliably detect what slipped through, because the LLM is unable to distinguish "I translated faithfully" from "I produced a fluent paraphrase that reads as faithful." The audit cannot be the mediator's job either — and any system that asks the mediator to audit itself has imported the original problem under a different name.

Only the library

The library has both sides of the interface. It has the user's original framing and the mediator's structured query, can compare them mechanically, can commit refusals as durable artifacts, and can require user confirmation at choke points the mediator cannot bypass. The translation primitives give the library this role explicitly. Without them, the library is a passive recipient of whatever the mediator submitted; with them, it is an active participant in the question's formation.

What the primitives are for

The translation primitives are not infrastructure for talking to the LLM. They are infrastructure for not trusting the LLM by default, while still using it as the interface. The distinction is the page: every other architecture either trusts the mediator implicitly or replaces it. This one keeps it, and audits it.

Anyone who could fully audit the translation would not need the library. So the audit cannot be the user's job.

The translation primitives address a category of failure. They do not exhaust the problem of getting a question accurately to the library. Four failure modes lie outside what they can catch on their own.

Adversarial users

A user who knows the schema can game it deliberately — formulating queries that are technically faithful translations but designed to extract answers the library would refuse if asked plainly. The translation primitives resist the naive mediator path, where the LLM substitutes silently. They do not stop a user committed to defeating them. Adversarial robustness is a separate problem with separate tools (rate limits, query auditing, reviewer escalation), and the primitives do not pretend to be those tools.

Domain naïveté

If the user fundamentally does not know what they are asking — wrong rung, wrong domain, wrong frame — translation can surface the gap but cannot supply the missing expertise. The library can refuse helpfully, naming the variables and assumptions a faithful question would require; it cannot teach the user what they would have to know to ask a question worth answering. This is one of the categories the next page treats as irreducibly human work.

Faithful translation, unanswerable query

Translation can succeed and the answer can still be "no, this cannot be answered" — for identification reasons, for scope reasons, for absence of any relevant model. The library has to distinguish "I cannot translate" from "I translated faithfully and the question is unanswerable for other reasons." Conflating the two reintroduces the silent failure the page was supposed to eliminate. The refusal-as-information primitive depends on this distinction being maintained downstream.

Mediator drift

The mediator's job is to translate for the user, and the closer it gets to that role — through long-term memory, accumulated context, user-specific tuning — the more it serves user intent at the expense of library checks. The translation primitives are designed for an arms-length relationship between user, mediator, and library. A deeply personalized mediator blurs that relationship in ways the primitives can't fully address. Worth flagging as a research direction; not solved here.

Where translation ends

The four failure modes above are not translation's fault. They mark the boundary at which translation as a primitive ends and other questions begin — adversarial defense, education, downstream identification logic, and the irreducibly human work of being the kind of user who knows what they are asking.

Translation Interface
The transition from the user's natural-language question to the structured library call. The site of most translation-time error and the only place where the question and the query can be compared.
Query Schema
The structured object the library accepts as a query — slots for population, intervention, outcome, rung, time horizon, scope qualifiers. The artifact that makes structured comparison possible.
Back-Translation
The natural-language restatement of the structured query, surfaced to the user before the library answers. Borrowed from human-translation practice; rarely done in LLM systems.
Frame Shift
A systematic substitution at translation that changes the question while preserving its surface plausibility. Magnitude rescaling, population narrowing, regime substitution. Detected by comparing the structured query against the original framing.
Rung Shift
Frame shift specifically at the level of Pearl's ladder — a counterfactual question silently translated to an observational query, or an interventional query treated as observational. Mechanically detectable when rungs are explicit slots in the schema.
Scope Laundering
Frame shift undertaken specifically to make the library's scope checks pass. The mediator reformulates the user's out-of-scope question as a structurally similar in-scope question. The failure mode that connects translation to scope.
Refusal-as-Information
A library refusal that names what variable is missing, what model would be needed, what assumption would have to hold. The primitive that prevents fluent guesses from substituting for faithful translations.
Next Step

Three methods pages now. Each ends with categories of failure that mechanisms cannot solve — misdeclaration, contestation, identification under disagreement, adversarial framing, domain naïveté, mediator drift. None are bugs in the primitives. They are the irreducible human work the primitives leave behind. The next page names that pattern.

info@rung3.ai

References

Pearl, J. & Bareinboim, E. (2014). External validity: From do-calculus to transportability across populations. Statistical Science 29(4): 579–595.

↑ Back to top