The first three methods pages produce checkable artifacts at query time. None of them, by default, persists in a form that supports later consultation. Audit closes the gap between "the library produced a checkable artifact" and "anyone can check that artifact six months later." Without audit, every prior primitive is checkable in principle but not in practice.
This page covers one component of the Library. For the case for the Library itself — why SCMs, not ML or LLMs, are the framework that justifies a library — see Why Structured Causal Models?
The Audit Problem
Every prior page on this site has quietly assumed that an answer or a refusal is a one-time event. The user gets the result; the library moves on. But causal models embed contestable assumptions, and contestable assumptions have to be re-examined later — by a regulator who needs to know the basis of a decision, by a domain expert revising a model, by a modeler returning to their own work, by the library itself when a parent model gets updated and downstream queries inherit the change.
Consider a regulatory reviewer six months after the wage-elasticity exchange from the prior pages. The reviewer has access to the user's natural-language question, the library's eventual response (a refusal), and possibly the model that was consulted. The reviewer does not have, by default: the structured query the mediator submitted, the back-translation the user was shown, the active scope at query time, the frame-shift flags that fired, the version of the model consulted, the chain of bridge nodes if a composition was involved, or the assumption pushforward that recomputed identifiability.
Without those, the reviewer cannot determine whether the refusal was correct, whether the model has since changed in ways that would alter the answer, or whether the assumptions that flowed through the query still hold. The reviewer is left to reconstruct a question that has already been answered, with most of the answer's basis missing.
The first three methods pages produced artifacts — query schemas, scope cards, translation blocks, refusal-as-information. Each existed in the moment. None of them, by default, persists in a form that supports later consultation. The library's output was checkable; whether it remains checkable depends on what was kept.
This is the gap audit closes. The prior pages put the right structure on the live exchange between user and library. Audit is what survives the exchange — what the library remembers afterward, and what a reviewer can do with what was remembered.
Four Audit Primitives
Same word as on the Composition, Scope, and Translation pages, on purpose. These are the library's basic operations for preserving and consulting the trail of a query. Each one is the durable counterpart of something the prior pages produced ephemerally.
1. Recording schema
A structured object that captures, per query: the user's natural-language question, the structured query the mediator submitted, the back-translation and the user's confirmation (or its absence), the model selected, the active scope at query time, every check that fired (passed or failed), and the final answer or refusal with its diagnosis. This is the durable counterpart to the query schema — the recording schema is what the query schema becomes after the query is over.
The recording schema is the only primitive on this page that has to be designed up front. What is not in the schema cannot be consulted later, regardless of how good the audit hooks are. The lineage here is the model-cards and datasheets-for-datasets tradition (Mitchell et al., 2019) — structured documentation that travels with a model — extended from the model itself to the queries made against it.
2. Query trail
The graph linking question, query, and answer, with every assumption that flowed through. Not a flat log — a structured trace that lets a reviewer ask "where did this number come from?" and follow the answer back through bridge nodes, identification recomputations, scope intersections, all the way to the user's original framing. Each node in the trail is a recorded artifact; each edge is an operation the library performed.
The query trail is what makes "checkable in principle" into "checkable in practice." Without it, recordings are isolated entries; with it, they are connected paths a human can walk.
3. Provenance per artifact
Every model, bridge node, scope declaration, and identification claim in the library carries its source: who authored it, on what evidence, with what revision history, with what reviewing modeler signing off. Provenance is per-artifact, not per-query — it is a property of the library's contents, consulted whenever a query touches the artifact.
The query trail records what was used. Provenance records who stood behind it. Both are needed: a reviewer who can see the trail but not the provenance has the structure of an answer without the warrant; a reviewer who can see the provenance but not the trail has the warrant without knowing where it was applied.
4. Audit hooks
The operations a reviewer can perform on a query trail without rerunning the original query. There are four worth naming, though more are possible:
- Replay — reconstruct what the answer would be under current model versions, holding the user's question fixed.
- Counterfactual replay — ask what the answer would have been if a particular assumption had been different. The library substitutes the alternative assumption into the trail and recomputes; the rest of the trail stands.
- Provenance walk — trace any node in the trail back through its source and revision history. Surface the modeler, the evidence, the peer review, the prior versions.
- Refusal review — re-examine a refusal to determine whether it should be reversed, sustained, or escalated under current models. The default outcome of a refusal review is itself recorded and appended to the trail.
Audit hooks are what convert recordings into something a human can use. A library with the first three primitives and no hooks has stored material no one can interrogate; the hooks are the interface between the trail and the reviewer.
| Primitive | What it preserves | Operation it enables |
|---|---|---|
| Recording schema | The structured record of a single query interaction | Reviewer reconstruction of what was asked, returned, and refused |
| Query trail | The graph linking question, query, assumptions, and answer | Walking back from any output to its origin |
| Provenance | The source and revision history of every library artifact | Determining who authored what, on what evidence |
| Audit hooks | The operations available on a recorded trail | Replay, counterfactual replay, provenance walk, refusal review |
A Worked Example
The prior worked examples walked the user forward through a question. This one walks a reviewer backward through what was recorded. Six months after the wage-elasticity exchange, a regulator opens an inquiry into how the library handles policy questions about minimum-wage changes. They begin with the recorded query trail of the original interaction.
The trail is structured, not narrated. The reviewer can perform four operations on it.
Open the trail
The reviewer reads the recording schema. The frame-shift flags tell them the mediator silently re-translated the user's 50% national hike into a 12% aggregated state-level query. The back-translation row tells them the user was shown the substitution and did not confirm. The refusal row tells them the library declined. None of this had to be reconstructed — it was preserved.
Walk provenance
The reviewer follows provenance on wage-elasticity-v2.1.0. Modeler: named, with affiliation. Evidence: tagged data sources with date ranges. Peer review: signed off by two named modelers, both with their own provenance. Revision history: the model has since been updated to v2.2.0 — additional state-level data extended the regime scope from 5–25% to 5–35%. The provenance walk surfaces this without requiring the reviewer to know to look for it.
Replay under the current model
The reviewer invokes the replay hook against v2.2.0. The audit system reconstructs what the answer would have been if the original query had been submitted today. The current model still refuses on regime grounds — the user's 50% magnitude is closer to in-scope than under v2.1.0, but still outside the extended training distribution. The original refusal was correct then; an analogous refusal is correct now. The reviewer has confirmed both.
Refusal review and sign-off
The reviewer accepts the refusal as correct and records the review on the query trail. The recording schema appends: reviewer name, date of review, models consulted, conclusion, signature. The trail now contains both the original interaction and the regulator's interpretation of it. The next reviewer who consults this query has access to the prior review's reasoning. Audit accumulates.
The library did not just produce a refusal at query time. It produced a refusal that could be defended six months later by someone who was not in the room — and a refusal review that future reviewers will inherit. That is the difference between an answer and a defensible answer, and between a judgment and an institutional one.
Why the Library and the Reviewer
The prior three methods pages each made a version of the same argument: only the library can do this work, because the user and the mediator cannot. This page makes a different argument. The library cannot do this work alone. It can only prepare the material for someone else to do.
The library can record everything. It cannot interpret what it recorded. The reviewer is the principal who decides whether a refusal stands, whether a revision is warranted, whether a contestation has merit, whether a model's provenance is adequate to the use being made of it. Those decisions are not derivable from the trail; they are made using the trail by someone with the domain knowledge, authority, and judgment to make them.
The library's job is to preserve the material the reviewer needs. The reviewer's job is to do the work the library cannot — judgment, contextual interpretation, scientific argument. Each is necessary; neither is sufficient.
This is the structural pivot of the methods arc. The first three pages argued that the library does the work the mediator and the user cannot. This page argues that the library does the work that prepares humans to do work only humans can do. The two arguments are not in tension. They describe different parts of the same division of labor: the library handles what is mechanizable, and prepares the residue for those who can handle what is not.
Provenance and audit are not infrastructure for compliance. They are infrastructure for keeping the library and its reviewers accountable to each other over time — the library by recording what it did and why, the reviewers by interpreting what was recorded and signing their interpretations into the same trail. Both parties are bound to the record; the record outlasts both.
What Audit Can't Do
Audit, like every prior primitive, addresses a category of failure rather than the whole problem. Four failure modes lie outside what these primitives can catch on their own.
Recording fidelity
What gets recorded is what the recording schema captures. Anything outside the schema is lost. The schema is itself a designed artifact, with assumptions about what will matter later that may turn out wrong. An audit reviewer can ask only the questions the schema anticipated. Audit retroactively cannot recover what was never recorded; the only fix is to revise the schema going forward, and the price of that fix is paid by all queries before the revision.
Reviewer capacity
Audit assumes a reviewer with the domain knowledge, time, and authority to interpret the trail. A library that produces audit-ready records faster than reviewers can consult them produces a backlog, not an audit. The bottleneck is human and scales differently from the library's throughput. Some queries will never be reviewed; the library cannot tell which in advance, and the gap between what is recordable and what is reviewable is itself an institutional question that the primitives do not resolve.
Provenance gaps
Provenance is only as good as the practices that established it. A model whose original modeler did not record their evidence carefully, or whose evidence has since become inaccessible, has degraded provenance regardless of how well the library tracks it now. The library cannot retroactively create provenance that was never there. The only mitigation is forward-looking: a provenance regime can prevent future gaps, but cannot close past ones.
The reviewer is in the system
The reviewer's decisions become part of the audit trail. Future reviewers consult them. Over time, the library accumulates an interpretive layer — past reviewer judgments — that shapes future ones. This is good (institutions accumulate wisdom) and bad (institutions accumulate received wisdom, including received error). The audit trail does not adjudicate between the two cases; humans do, and they do it with the same primitives as everyone else, against the same accumulated record.
The four failure modes above are not audit's fault. They mark the boundary at which audit as a primitive ends and other questions begin — schema design, institutional capacity, the historical depth of the discipline a model belongs to, and the irreducibly human work of reading a record and deciding what it means.
Key Terms
Four methods pages have now accumulated four categories of work the library hands off: contestations it cannot adjudicate, misdeclarations it cannot detect, framings it cannot teach, records it cannot interpret. The methods don't reduce human work; they concentrate it. The next page names the people who do that work — and why the work is theirs.
info@rung3.ai
Mitchell, M., Wu, S., Zaldivar, A., Barnes, P., Vasserman, L., Hutchinson, B., Spitzer, E., Raji, I. D. & Gebru, T. (2019). Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency (FAccT '19). arXiv:1810.03993.