This page is a vision-direction page. It describes a research program your engagement participates in, not a future state your engagement delivers. The closing section names exactly what is shippable today and what is not. Read with that frame.

The proposition is simple to state and consequential in implication. Future AI systems will not reason from text alone. They will query reusable libraries of causal structure — graphs and equations encoding mechanism, intervention, and counterfactual logic — with language models acting as the interface, the translator, and sometimes the extractor. A reader poses a question in plain language; the language model interprets intent, retrieves the relevant causal models from a library, composes them where appropriate, runs the intervention or counterfactual operation against the structure, and returns an answer grounded in mechanism rather than fluent in pattern.

Today’s LLMs store statistical associations and textual co-occurrence. They do not store mechanisms in a form that can be intervened on. A library that does — in machine-readable, composable, audit-ready form — is the missing component the literature now coalesces around. Sometimes it is called a library of structured causal models; sometimes a causal knowledge graph; sometimes executable scientific objects. The name varies; the structural commitment is the same.

Instead of papers being static text, they become executable causal objects.

The vision’s ambition extends beyond consulting deliverables. If it lands, it changes scientific publishing, knowledge management, AI explainability, and the substrate machine reasoning runs on.

Companion arguments

The frontier framing on this page is one of three adjacent arguments for the same architecture:

Why Not Use an LLM? — the negative case. Why a language model alone cannot answer causal questions, and where the failure compounds.

Why Structured Causal Models? — the constructive positive case. Three tools (ML, LLMs, SCMs), three distinct jobs.

Four Paradigms, One Bet — the positioning page. What this work bets on, and where it stands relative to other methodological commitments in the field.

The system architecture the literature converges on has five layers. Each layer has a distinct job, distinct failure modes, and distinct tooling.

Layer 1 · Knowledge extraction

Language models parse papers, textbooks, interview transcripts, internal reports — extracting candidate causal claims. This drug treats this condition. This exposure correlates with this outcome. This intervention reduces this risk. The output is not yet a model; it is structured candidate material for one.

Layer 2 · SCM normalization

Candidate claims become formal structure: directed acyclic graphs, structural equations, intervention operators. Ontology references are resolved. Variables get definitions. The model becomes checkable — an artifact a reviewer can read and contest. Library design →

Layer 3 · Validation

Domain experts review the structure. Edges, assumptions, confounders, and intervention claims are vetted, contested, and either confirmed or rejected. This is where the LLM’s extraction becomes the expert’s endorsement — or doesn’t. See the elicitation methods →

Layer 4 · Retrieval and composition

Given a question, the system retrieves relevant partial models from the library and composes them where they connect. Compatibility is checked; conflicts are surfaced rather than silently resolved. The library refuses to compose models whose scopes do not intersect cleanly. Composition primitives →

Layer 5 · Causal reasoning engine

Runs the actual operations the LLM cannot: interventions (do(X)), counterfactuals (abduction, action, prediction), mediation analysis, identification under confounding. The answer is grounded in equations, not in pattern. The LLM returns to its proper role: speaking the answer in language the reader can act on.

Five active research programs are pushing this direction forward.

LLMs as causal-discovery assistants. Researchers now use language models as priors over causal structure learning — suggesting edges, proposing DAGs, encoding domain knowledge that pure data-driven methods would miss. Ban et al. (2023) is foundational here.

LLM + SCM integration. The substantive question is how to combine symbolic causal models, probabilistic reasoning, and LLM semantic understanding into a single architecture. The Wu et al. (2024) survey lays out the landscape: SCM-guided prompting, causal reasoning augmentation, intervention-aware LLMs.

Causal knowledge repositories. First-generation systems are appearing — machine-readable causal knowledge graphs constructed from survey and literature data with LLM extraction pipelines. Rawal et al. (2026) shows this transition from research idea to operational artifact.

Causal reasoning as a core LLM capability. A growing position holds that future language models must reason causally, not just predict text fluently. Kiciman et al. (2023) frames LLMs as interfaces to causal reasoning systems — the precise framing this page is built on.

Automatic SCM extraction. The frontier within the frontier: converting published literature directly into executable causal models, at scale, with benchmarked quality. Wan et al. (2024) surveys the current state and remaining open problems.

A separate but increasingly important strand handles the multiple-experts case — how the library should represent the situation where experts disagree about edges, confounders, and mechanisms. Cox (2026) formalizes competing expert SCMs as the artifact, rather than treating disagreement as noise.

The literature’s strongest claim is that several capabilities considered constitutive of robust general intelligence are not separately available from any current AI architecture — but become natural in an SCM-library architecture.

Capability Why an SCM library is the right substrate
Persistent world models Reusable mechanisms survive across contexts; an SCM is a stable object, not a regenerated guess.
Counterfactual reasoning Native to SCMs through abduction-action-prediction. Not natural to LLMs at all.
Scientific abstraction Explicit structure is what makes a claim communicable, contestable, and reusable.
Transfer learning A causal module learned in one domain can compose with modules from another — if their scopes are checkable, which they are.
Explainability Every answer traces to specific equations, edges, and assumptions — not to weights in a network.
Planning under intervention Running do(X) and observing the propagated effect is planning, in the formal sense.

The reader hiring an engagement does not need this argument to be true. But if it is true — and growing portions of the field now think it is — then the work being authorized is not a side project. It is a piece of where the field is going. Your causal model library, when built, is the kind of artifact this architecture is made of.

The vision above is research; most of it is not yet deployed practice. The honest accounting separates what is shippable now from what is not.

Shippable now
  • One organization’s causal model library — built with your experts.
  • Structural equations vetted, audited, versioned.
  • Composition and scope checks at query time.
  • LLM interface for plain-language querying.
  • Provenance tracking for every claim.
  • Your team capable of extending it.
Research, not delivery
  • Cross-organization SCM federation.
  • Automatic, unsupervised SCM extraction from literature at scale.
  • General-purpose causal reasoning agents.
  • The “GitHub for causality.”
  • Autonomous scientific discovery.
  • AGI as a near-term consequence.

The honest framing is that the engagement is the well-grounded version of the column on the left. The column on the right is real research, but not what you can authorize. The work is to build something that is sturdy now and that compounds in value if the research direction continues. If it does not continue, the library you built is still the auditable causal model your organization needs. The downside is bounded.

  1. Ban, T., Chen, L., Wang, X., & Chen, H. (2023). From Query Tools to Causal Architects: Harnessing Large Language Models for Advanced Causal Discovery from Data. arXiv:2306.16902
  2. Kiciman, E., Ness, R., Sharma, A., & Tan, C. (2023). Causal Reasoning and Large Language Models: Opening a New Frontier for Causality. OpenReview
  3. Wan, G., Lu, Y., Wu, Y., et al. (2024). Large Language Models for Causal Discovery: Current Landscape and Future Directions. arXiv:2402.11068
  4. Wu, A., Kuang, K., Zhu, M., et al. (2024). Causality for Large Language Models. arXiv:2410.15319
  5. Rawal, A., Johnson, K., & Martinez, R. (2026). LLM Assisted Causal Knowledge Graph Generation Framework for Survey Data. Journal of Engineering Design. Taylor & Francis
  6. Cox, L. A. (2026). Combining Diverse Expert Opinions in Risk Analysis Using Relative Causal Knowledge. Risk Analysis.

↑ Back to top