Structure Learning from Data

Structure learning algorithms identify conditional independence structure in data. For small-sample, expert-rich enterprise risk problems, expert encoding dominates. For large variable sets with abundant data, structure learning earns its complexity cost.

Structure learning requires sample sizes that are often unavailable for enterprise risk problems. Expert knowledge encodes mechanism — which generalises to new regimes better than historical correlations.

What structure learning does

A Bayesian network’s structure — the directed acyclic graph over the variable set — encodes the conditional independence relationships among the variables. Structure learning is the problem of recovering this graph from data: given a dataset of observations over a set of variables, find the DAG that best explains the observed conditional independence structure.

The problem is computationally hard in general: the number of possible DAGs over n variables grows super-exponentially with n, so exhaustive search is feasible only for very small graphs. In practice, all structure learning algorithms use heuristics — greedy search, constraint-based methods, or a combination — and find locally optimal rather than globally optimal solutions.

The output of a structure learning algorithm is a graph (or an equivalence class of graphs) plus the parameter estimates (CPTs) for that graph. The resulting model can be used for the same inference tasks as an expert-encoded BN — prediction, diagnosis, and intervention reasoning — but with the additional uncertainty that the graph itself may be wrong.

Score-based methods

Score-based methods assign a score to each candidate graph — typically a penalized likelihood that rewards data fit while penalising model complexity — and search for the highest-scoring graph. Common scoring functions include the BIC (Bayesian Information Criterion), the BDe score (Bayesian Dirichlet equivalent), and the AIC.

The BDe score has a particularly elegant property: it is equivalent to the posterior probability of the graph given the data and a prior over graphs, under a specific class of prior distributions. This means that structure learning with the BDe score is Bayesian — it finds the maximum a posteriori graph given the data, which can be extended to a posterior distribution over graphs rather than a point estimate.

Common search strategies include greedy hill-climbing (add, remove, or reverse one edge at a time, accepting changes that improve the score), simulated annealing (accept some score-reducing changes to escape local optima), and evolutionary algorithms. For graphs with more than 20-30 variables, these heuristics become unreliable in recovering the true structure even with large datasets.

Constraint-based methods

Constraint-based methods, of which the PC algorithm is the most well-known, work by testing conditional independence: does X ⊳ Y | Z for each triple (X, Y, Z)? The results of these tests constrain which edges can be present in the graph. The algorithm iteratively removes edges where the corresponding conditional independence holds and orients the remaining edges using rules that preserve acyclicity and d-separation.

The key advantage of constraint-based methods: they are more transparent about their assumptions. The algorithm’s output is directly justified by the conditional independence tests — each edge is present because no conditioning set was found that makes the endpoints independent. The key disadvantage: conditional independence tests require large samples to be reliable, and they are sensitive to violations of the faithfulness assumption (the assumption that the data-generating process does not have exact cancellations of causal effects).

Hybrid methods combine score-based and constraint-based approaches: use constraint-based tests to restrict the search space, then score the candidate graphs within that restricted space. The MMHC algorithm is the most widely used hybrid, and generally outperforms pure score-based or constraint-based methods on realistic sample sizes.

Why expert encoding usually wins

Structure learning requires sample sizes that are often unavailable for enterprise risk problems. To reliably detect a conditional independence relationship between two binary variables given a set of three conditioning variables, hundreds to thousands of observations per conditioning combination are needed. For a credit risk model with seven input nodes, the number of required observations can exceed the available dataset by orders of magnitude.

Expert knowledge is more reliable than data-sparse inference for another reason: it is less sensitive to the historical regime. A structure learned from data reflects the conditional independence structure of the historical period. A structure encoded by domain experts reflects their understanding of the mechanism — which is more likely to generalize to new regimes, including the tail scenarios that matter most.

The practical rule: use structure learning as a complement to expert encoding rather than a replacement. Run structure learning on available data to identify potential edges that expert elicitation may have missed, or to flag expert-encoded edges that the data does not support. Use the output as input to the expert discussion, not as a substitute for it.

When to use structure learning

Use structure learning when: the variable set has more than 15-20 variables (making expert elicitation of all pairwise relationships impractical); domain expertise cannot reliably specify the direction of causal relationships; sufficient data exists to support reliable conditional independence testing (typically >1,000 observations per conditioning combination at the relevant granularity); and the goal is discovery rather than confirmation of a known causal structure.

Bioinformatics and genomics are the canonical domain where structure learning performs well: thousands of gene expression variables, large datasets from high-throughput experiments, and causal relationships that cannot be specified from first principles. Enterprise risk — with its small datasets, expert-rich domain knowledge, and emphasis on interpretability — is at the opposite end of this spectrum.

The middle ground: process mining for operational risk, where transaction logs provide large datasets of process execution sequences and the causal structure of the process is partially but not fully known. Structure learning on process logs can identify bottlenecks and dependencies that expert elicitation would miss, while expert knowledge constrains the search space and validates the output.

In the cases

Utilities

Supply Chain Risk

The hidden fabricator structure could not have been learned from the scorecard data — the data reflected the labels, not the underlying causal dependencies.

Argument

The Four Phases

The engagement produces expert-encoded graphs precisely because structure learning from data is insufficient for the problems where causal models matter most.

Next Step

Not sure whether your problem is better served by structure learning or expert elicitation? The answer depends on sample size, variable count, and how stable the causal structure is.

info@rung3.ai

Condition	Expert encoding	Structure learning
Sample size	Works with small samples	Requires hundreds–thousands per conditioning set
Domain knowledge	Encodes mechanism directly	Infers from correlation — cannot distinguish causal direction
Rare events	Expert probability elicitation covers sparse data	Insufficient data to detect rare causal relationships
Variable count	Becomes unwieldy above ~30–50 variables	Handles large variable sets algorithmically
Best use	Risk, compliance, clinical — expert-rich, data-sparse	Sensor networks, genomics — data-rich, domain-sparse

You can learn a Bayesian network’s graph from data. You should usually encode it from expert knowledge instead.

What structure learning does

Score-based methods

Constraint-based methods

Why expert encoding usually wins

When to use structure learning