Hybrid Bayesian Networks

Three-state discrete nodes are tractable and interpretable. They collapse the tail behavior that risk management most needs to see. Hybrid BNs preserve both — discrete nodes for structure, continuous nodes for outcomes.

Discretising a heavy-tailed continuous variable into Low/Medium/High collapses the distinction between a 1-in-20 event and a 1-in-1,000 event. Both become “High.”

The discretisation trade-off

Every Bayesian network built for enterprise risk quantification faces a fundamental choice: represent each variable as a discrete set of states (Low/Medium/High, or finer buckets) or as a continuous distribution (normal, lognormal, beta, or a mixture). The practical default is discretisation — it is tractable, interpretable, and the resulting CPTs can be elicited from experts using natural language questions about conditional probabilities.

The cost is well-understood in principle but rarely quantified in practice: discretisation introduces information loss, changes the tail behavior of the model, and makes it impossible to propagate the full uncertainty from continuous inputs through the network to a continuous output. For many enterprise risk applications — where the questions are about whether risk is “high” or “low” — this cost is acceptable. For applications where the precise tail distribution matters — VaR, expected shortfall, reserve estimation — it may not be.

The specific failure mode

Discretising a heavy-tailed continuous variable into three states (Low/Medium/High) collapses the tail behavior into the “High” state’s probability. The model loses the distinction between a 1-in-20 event and a 1-in-1,000 event — both are “High.” For risk management, this is precisely the distinction that matters most. Discretisation is safe for causal structure questions (does this variable affect that one?) but dangerous for tail risk quantification.

What is lost in discretisation

Tail resolution. A three-state discretisation with states Low, Medium, High assigns all probability mass above a threshold to the “High” state. The distribution within that state is lost. For risk quantification, the within-High distribution — the difference between a 1% and a 0.1% annual probability — is often the quantity of interest. Increasing the number of states improves resolution but grows the CPT exponentially.

Aggregation accuracy. When two discrete random variables are combined (sum of losses from two business lines), the discrete convolution introduces quantisation error that accumulates with each aggregation step. Monte Carlo simulation avoids this by working with continuous samples throughout the aggregation. A discrete BN performs the aggregation through the CPT of the parent node, which is limited in resolution by the discretisation scheme.

Asymmetric conditionals. Many conditional distributions in risk modeling are asymmetric: the distribution of Loss Event Frequency given High Threat Capability and Low Control Strength is not a symmetric function of its parents. A CPT can represent this asymmetry, but only at the resolution of the discretisation. A continuous conditional distribution — a lognormal with mean and variance that are functions of the parent states — can represent the asymmetry exactly.

Hybrid BN architectures

A hybrid Bayesian network contains a mix of discrete and continuous nodes. The structural constraint that exact inference algorithms impose: a discrete node cannot have a continuous parent, because the CPT of a discrete node with a continuous parent would require integrating over the parent’s distribution, which makes exact inference intractable in the general case.

The allowable structures are: (1) discrete nodes with discrete parents (standard CPT); (2) continuous nodes with continuous parents (conditional density, e.g., a Gaussian with mean that is a linear function of the parents); (3) continuous nodes with discrete parents (mixture of distributions, one per discrete parent state); and (4) continuous nodes with mixed discrete and continuous parents (conditional linear Gaussian). Structure (4) is the most flexible and covers most practical cases.

Practical tools: AgenaRisk (used throughout this site) supports hybrid BNs with a range of parametric conditional distributions. Hugin and Netica support conditional linear Gaussian models. Pymc and Stan support full Bayesian hierarchical models that generalize beyond the BN structure but sacrifice the exact inference guarantee.

Conditional linear Gaussian models

In a Conditional Linear Gaussian (CLG) model: discrete nodes have discrete parents only (standard CPT); continuous nodes have a Gaussian distribution whose mean and variance are linear functions of the continuous parents, with one set of parameters per combination of discrete parent states.

Example: Loss Magnitude (continuous) has two parents — Macro Regime (discrete: expansion/recession) and Severity Score (continuous). The CLG structure specifies: Loss Magnitude ~ N(μ_regime + β_regime × Severity Score, σ²_regime), with separate mean, slope, and variance parameters for each macro regime. This is a mixture of regression models, one per discrete parent state.

CLG models support exact inference via the junction tree algorithm with Gaussian message passing. The posterior over continuous nodes is a mixture of Gaussians — which can represent multimodal distributions if the discrete parent has multiple states. This is considerably more expressive than a fully discrete model while remaining computationally tractable.

When to discretise and when not to

Discretise when: the variable is naturally categorical (control state: effective/partially effective/ineffective); the question is directional rather than precise (is risk high or low?); expert elicitation of probabilities is more reliable than expert elicitation of distribution parameters; the model is used for causal structure questions rather than tail risk quantification; and the graph has many nodes where CPT tractability matters.

Use continuous nodes when: the variable is a financial quantity where the magnitude matters (loss amount, reserve estimate); the tail behavior is the quantity of interest; the variable enters the model as a sum or product of other continuous variables; and the conditional distribution is well-approximated by a parametric family whose parameters can be estimated from data.

Use a hybrid architecture when: the model contains both structural (causal direction) questions and quantitative (tail risk) questions; some nodes are naturally discrete (regime, state) and others are naturally continuous (loss amount, frequency); and the CLG or mixture-of-distributions structure is sufficient to represent the conditional relationships.

The practical recommendation: build the initial model with full discretisation, validate the causal structure, and then selectively replace the outcome nodes — the nodes whose distributional accuracy matters most — with continuous or hybrid representations. The causal structure validated on the discrete model is preserved; only the distributional representation of the key outcome nodes changes.

In the cases

Healthcare

Iatrogenic Medications

BMI and LDL are continuous — the model uses a Gaussian CLG to preserve distributional accuracy rather than discretising into Low / Medium / High.

Healthcare

Statins & Hospitalisation

The statin case uses a Gaussian CLG structural causal model — explicitly hybrid — to model the continuous clinical variables and their causal relationships.

Next Step

If your BN produces directional guidance but your capital model needs tail distributions, the hybrid architecture bridges both requirements.

info@rung3.ai

Node characteristic	Discrete	Continuous (CLG)
Naturally categorical (e.g. control state, failure mode)	✓ correct representation	—
Question is directional (high / low risk)	✓ simpler elicitation	—
Model needs tail risk / VaR / TVaR	✗ tail resolution lost	✓ full distribution preserved
Variable is continuous physiological / financial measure	✗ boundary sensitivity lost	✓ preserves sensitivity at boundaries
Expert elicitation of distributions is tractable	—	✓ Gaussian parameters are elicitable