Bayesian Risk Decision-Making

The distinction between a risk assessment model and a risk decision model is structural. A risk assessment model contains events, mechanisms, and consequences. A risk decision model adds two more variable types — desired outcomes and available actions — and encodes the causal relationships between all five. The addition of those two variable types is what converts a model that describes risk into a model that can be queried for the optimal response to it.

Rung3.ai is causal AI for decision support under uncertainty — Bayesian-network architecture grounded in expert structural knowledge rather than in historical fit. The case for that approach predates the term “AI”:

“Decision makers in all areas of life (including physicians, generals, scientists, bankers and politicians) must often assess and manage risk when there is little or no direct historical data to draw upon, or where relevant data is difficult to identify. The international credit crisis was not predicted by the world’s leading financial analysts because they relied on models based on historical statistical data that could not adapt to new circumstances — even when those circumstances (in this case the collapse of the mortgage sub-prime market) were foreseeable by experts with more intimate knowledge of the market place. The challenges are similarly acute when the source of the risk is novel: terrorist attacks, ecological disasters, major project failures, and more general failures of novel systems, market-places and business models.”

— Fenton, N. & Neil, M., Risk Assessment and Decision Analysis with Bayesian Networks, 2nd ed., CRC Press, 2018. On video: When machine learning from data is doomed to fail.

The Five Variable Types

Events

The adverse events the model is designed to reason about. An earthquake, a cyberattack, a component failure, a regulatory breach, a counterparty default. Events are not the same as consequences — they are the initiating causes. The model may include multiple events, and events may cause other events: a cyberattack can trigger a service outage which triggers a regulatory breach.

Events are typically binary (occurred / did not occur) or categorical, with a prior probability that represents the unconditional likelihood before any evidence is observed. That prior is the starting point for all forward inference from the model.

State

The vulnerability or resilience of the system at the time of the event. The same event produces very different consequences depending on the system's state. An earthquake in a region with anti-seismic construction and trained emergency responders produces fewer casualties than the same earthquake in a region without either. A cyberattack against an organization with mature patch management and network segmentation produces a smaller breach than the same attack against one without.

State variables are the causal intermediaries between events and consequences. They are also the primary target for pre-active risk management: you cannot prevent the earthquake, but you can change the state of the system it hits. Actions that reduce vulnerability operate through state variables.

Consequences

The outcomes the organization cares about: financial loss, casualties, regulatory penalties, reputational damage, infrastructure failure, operational disruption. Consequences are caused jointly by events and state — the causal path runs Event → State → Consequence. This structure is what allows the model to distinguish between the severity of the initiating event and the vulnerability of the system it strikes.

Consequences are typically the nodes queried first in forward inference: given this event and this system state, what is the probability distribution over each consequence dimension? They are also the nodes used as targets in backward inference: given that casualties must remain below a threshold, what state variables must be at what levels?

Desired outcomes

The constraints or targets on consequences that represent acceptable risk: casualties below a threshold, financial loss within appetite, probability of regulatory breach below a limit. Desired outcomes translate the organization's risk appetite from a policy statement into a formal node in the influence diagram — a utility specification that the optimization can act on.

This is the variable type that is most commonly absent from Bayesian risk models built for assessment rather than decision. Without desired outcome nodes, the model can compute the probability distribution over consequences. It cannot identify the actions that produce consequences within the desired range. Adding desired outcomes converts the assessment model into a decision model.

Actions

The available interventions: preventive controls, mitigation measures, emergency responses, legislative or policy changes. Actions operate causally on state variables (improving system resilience), on event probabilities (reducing the likelihood of the initiating event), or directly on consequences (emergency response reduces casualties after an event occurs). The model encodes which actions affect which variables through which mechanisms.

Actions are decision nodes in the influence diagram — nodes that the decision-maker controls, not nodes whose values are uncertain. The model's optimization identifies which action configuration, subject to constraints (budget, technical feasibility, political constraints), produces consequences most consistent with desired outcomes.

Why the causal order is not optional

The five variable types must appear in the causal order Events → State → Consequences ← Desired Outcomes ← Actions (where ← indicates backward influence in the optimization). Encoding actions as causing events directly — without passing through state variables — conflates the mechanism: a control does not prevent the earthquake, it reduces the damage the earthquake causes by changing the system's vulnerability. Getting the causal order wrong produces a model that cannot correctly decompose the effect of an action into its component mechanisms, which makes the optimization unreliable and the attribution incorrect.

Constraints as a sixth structural element

In practice, a sixth element is always present: the constraints graph. Budget limits, technical capability limits, regulatory requirements, and feasibility constraints bound the action space. The constraints are not causal variables — they do not cause anything — but they restrict which action configurations the optimization considers. Encoding constraints explicitly, rather than implicitly through the action node's conditional probability table, keeps the model interpretable: a constrained optimization can report not just the optimal action but the cost of the binding constraint — what would improve if the budget were larger, or the technical capability greater.

Three Query Modes on the Same Model

Pre-active — before the event

Forward inference from events to consequences, with actions as the variable of interest.

Before an adverse event occurs, the question is: where is the system most vulnerable, and which actions most cost-effectively reduce expected harm? The model is queried by setting event probabilities at their priors and varying action configurations to observe the effect on the consequence distribution.

This is the mode most suited for capital allocation decisions: which controls, at which cost, produce the greatest expected reduction in adverse consequences? The answer is the risk management budget allocation that maximises expected utility — the same calculation as any other capital budgeting problem, made possible by having a causal model that connects action spend to consequence outcomes through quantified mechanisms.

→ Typical question: “Given our current state and the available actions, what is the optimal allocation of our $2M risk management budget to minimize expected loss from the top five risk scenarios?”

Reactive — during the event

Mixed inference from current observations to current state and projected consequences, with real-time action selection.

Once an event is occurring or has been detected, the question shifts: given what is currently observed, what is the current state of the system, what consequences are most probable, and what is the optimal immediate response? The model is queried by setting observed variables as evidence and reading the posterior over unobserved state and consequence variables.

Reactive mode is diagnostic reasoning under time pressure: the model identifies the most probable current state from partial observations, projects the consequence distribution forward, and identifies which available actions most cost-effectively alter the trajectory. The same inference mechanism used for transformer root-cause analysis applies here — the difference is that the consequence node is in the future rather than the past.

→ Typical question: “It is 2:47 AM, the Buchholz relay has activated, oil temperature is elevated, and we have no visible external damage. What is the current most probable failure mode, and should we repair or replace?”

Pro-active — preventing catastrophe

Backward inference from desired outcomes to required actions — the “target-to-action” direction.

The most distinctive query mode: rather than propagating forward from events to consequences, the model propagates backward from a specified consequence target to identify the actions and state conditions that most efficiently achieve it. Set the desired outcome node — “probability of casualties exceeding threshold must be below 5%” — and ask: which combination of state improvements and preventive actions satisfies this constraint at minimum cost?

This is the mode that converts a risk assessment into a risk management specification. It answers not “what is the risk?” but “what must be true about our system and our actions for the risk to be acceptable?” — which is the question that a board setting risk appetite actually wants answered, and which no standard risk tool can compute.

→ Typical question: “Our board has set a tolerance of less than 5% probability of a loss exceeding $10M. Working backward through the causal model: what state conditions must hold, and what action set achieves those conditions, at minimum total cost?”

Why three modes on one model is more than three separate models

Each of the three query modes produces output that is consistent with the others — because they are all inference on the same causal structure. The pre-active analysis identifies the optimal budget allocation. The reactive mode applies the same model in real time as events unfold. The pro-active mode back-propagates from the consequence targets that the pre-active analysis was designed to meet. The three modes form a coherent cycle: plan, respond, verify. Separate models for each phase would produce inconsistent answers — the reactive model might identify an optimal response that violates the consequence targets set in the pro-active analysis, because they are calibrated independently.

Time-Varying Risk — Dynamic Bayesian Networks

A Dynamic Bayesian Network extends the static model with inter-slice arcs that connect variables at time t to their counterparts at time t+1, encoding how each variable evolves rather than just how they relate within a single period. This is the structure behind stress testing, early warning indicators, and multi-period scenario simulation.

The full treatment — transition structures, stress testing, early warning applications, and limitations — is on the Dynamic Bayesian Networks page.

Importance Factors — What the Model Knows About Its Own Sensitivity

The risk-to-budget connection is the practical payoff of the importance measures. The question every capital allocation decision reduces to is: given a fixed budget, which investments in controls or state improvements produce the greatest expected reduction in harm? Importance factors make that computation tractable — they rank every variable in the model by its expected contribution to risk, so the investment ranking follows directly.

A Bayesian risk model does not just produce a probability of failure. It produces a family of importance measures that quantify how sensitive the outcome is to each variable — which components, if they failed, would most increase system risk; which controls, if they improved, would most reduce it. Four measures are standard, each answering a distinct question:

Risk Augmentation Factor (RAF)

How much does system failure probability increase if component i fails? RAF = P(system failure | component i failed) / P(system failure). Values above 1 indicate that component failure increases system risk. Used to identify which component failures are most consequential — the weakest links in the causal chain.

Risk Diminution Factor (RDF)

How much does system failure probability decrease if component i is made perfectly reliable? RDF = P(system failure) / P(system failure | component i perfect). Values above 1 indicate that improving component reliability reduces system risk. Used to identify the highest-value targets for reliability investment — the components where investment produces the greatest system-level improvement.

Birnbaum's Importance Factor

The marginal change in system reliability per unit change in component i's reliability. This is the derivative of system reliability with respect to component reliability — the rate at which system performance improves as the component improves. Used when comparing investment options that produce incremental rather than perfect reliability improvements.

Vesely-Fussell Diagnostic Factor

Given that a system failure has occurred, what is the probability that component i contributed to it? This is the diagnostic complement to the RAF — where RAF measures the forward impact of component failure on system risk, the Vesely-Fussell factor measures the backward attribution of a system failure to its component causes. Useful for post-incident root-cause prioritisation.

These four measures are computed from the same Bayesian network, using the same belief propagation mechanism as any other query. They do not require a separate sensitivity analysis tool. They are standard outputs of the model — and together they provide a complete picture of where the system is vulnerable (RAF), where investment is most productive (RDF and Birnbaum), and which components to investigate first after a failure (Vesely-Fussell).

The connection to capital allocation

The RDF and Birnbaum factors are the formal inputs to the risk management budget allocation problem. Rank available controls and investments by their expected improvement in the RDF or Birnbaum factor per dollar spent. The optimal allocation is the set of investments that maximises total system reliability improvement subject to the budget constraint. This is the pre-active query mode stated as an optimization — and the importance factors are what make it computable rather than qualitative.

Scaling to Large Systems

Object-oriented structure

Large systems are typically composed of repeated components — a fleet of identical transformers, a set of similar manufacturing cells, a portfolio of comparable properties. Object-oriented Bayesian networks introduce the concept of a class: a network template that encodes the causal structure shared by all instances of a component type. Each instance of the class inherits the structure but can have different parameter values — different failure rates, different maintenance histories, different operating environments.

This separation of structure from parameters produces two benefits. First, the model for a fleet of 200 transformers is not a network with 200 × n nodes — it is a class definition with 200 parameterized instances. The graph remains interpretable; only the parameters vary. Second, information about one instance can update the parameters of similar instances through hierarchical Bayesian pooling: an early failure on transformer T-4471 is evidence about the failure rate distribution for the entire cohort, and the model can update cohort-level parameters accordingly.

Hierarchical decomposition

Complex consequence variables — “total economic loss,” “reputational damage,” “regulatory exposure” — can be decomposed into more elementary sub-variables at whatever level of resolution the decision requires. The five-variable architecture provides the skeleton at the top level; each node can be expanded into a sub-network that models its internal causal structure in more detail.

The practical implication: a board-level model might have fifteen nodes and answer strategic allocation questions. The same model, with the consequence nodes expanded, becomes an operational model that answers incident-response questions. The same model, with the state variables expanded, becomes a technical model that answers engineering decisions. All three levels are consistent because they are decompositions of the same causal structure — not separate models that must be reconciled.

In the cases

Insurance

Property Insurance

The five-variable architecture is visible throughout: hazard exposure, vulnerability factors, moderating conditions, intervention nodes, and loss outcome.

Compliance

NIST CSF 2.0

Threat likelihood, control effectiveness, breach probability, and financial impact follow the canonical architecture exactly — each node in its causal order.

Utilities

Utility Wildfire Risk

Equipment condition, environmental factors, ignition probability, and liability exposure are the architecture instantiated in a utility's risk model.

Next Step

The five-variable architecture applies to every risk domain your organization manages. The question is whether your current models contain all five variable types — or whether they stop at consequences, leaving the decision half of the model unbuilt.

info@rung3.ai

Tchangani, A.P., 2021, “Bayesian Networks in Risk Informed Decision-Making,” Advances in Mathematics Research vol. 29, Nova Science Publishers · Jensen, F.V. & Nielsen, T.D., 2007, Bayesian Networks and Decision Graphs (2nd ed.), Springer · Birnbaum, Z.W., 1969, “On the Importance of Different Components in a Multicomponent System,” Multivariate Analysis II, Academic Press · Murphy, K.P., 2002, Dynamic Bayesian Networks, PhD thesis, UC Berkeley

The canonical architecture of a Bayesian risk model.

On this page