Utility reliability analysis typically works backward from events: a feeder failed, the SAIDI contribution is calculated, the inspection log is reviewed. The resulting report describes what happened. It does not answer the question the board is actually asking — whether the failure was preventable, which capital decisions are genuinely driving outage risk, and what the evidence on a symptomatic feeder is actually telling you. These are causal questions. They require a model of the system that generated the events, not just a record of the events themselves.

The confounding problem is structural. Capital budget pressure does not only cause deferral of formal replacement programs — it simultaneously reduces routine maintenance spend across the fleet. When a board observes that deferred utilities have higher outage rates, it is looking at two effects at once: the deferral itself, and the maintenance degradation that tends to accompany it. Treating these as one effect produces the wrong capex decision. A causal model holds the two pathways apart.

Analysis ComponentStandard ApproachCausal Approach
Post-event reviewDocuments what failed and whenEstimates what would have happened under different decisions
Capital deferral effectCorrelated with outage rate (confounded)Isolated from maintenance degradation via do() intervention
Root cause diagnosisOperations team opinion vs. asset team opinionPosterior distribution over causes given observed fault data
Preventability assessmentNot quantified — narrative onlyCounterfactual probability: P(failure | inspection on schedule)
Budget pressure effectNot in the modelExplicit confounder node — its two pathways are separated
A deferred replacement program and a degraded maintenance program tend to arrive together. The board that treats them as one problem funds the wrong solution.
3 Questions, 3 Rungs
  1. The Ridgeline feeder failed during last August’s heat dome after an 18-month inspection deferral — would it have failed even if inspection had been on schedule? — Rung 3 (Counterfactual). Answering it requires abduction to anchor the actual heat dome severity, the asset’s age, and the observed failure before changing only the inspection decision; Asset Age and Load Stress are confounders that must be held at their actual values.
  2. If we defer the substation transformer replacement program by two years, what does that actually do to forced outage probability at peak summer load? — Rung 2 (Intervention). A do() query severs the confounding path from Capital Budget Pressure through the deferral decision, isolating the causal effect of the interval extension from the budget environment that tends to accompany it.
  3. A feeder serving a hospital district is showing elevated fault frequency — is this worse storms, aging infrastructure, or overdue vegetation management corridors? — Rung 1 (Association). The graph encodes which dependencies exist between Weather Severity, Asset Age, Vegetation State, and fault outcomes; entering the observed fault trend updates every genuinely connected upstream node, separating what the evidence supports from what each team is asserting.

Reading the screenshots: a black check mark on a node means it has been set as observed evidence — a fact entered into the model, acting as a filter. A red check mark means it has been set as a do intervention — a decision applied to the model, severing the influence of its parents.

Reading the spec tables: each Run the Analysis block lists the exact steps to reproduce each screenshot in Bayes Server. The Obs / Do column uses three italic control tokens: clear — reset the model to a blank no-evidence state; abduction step — enter the factual observations that anchor the U nodes to this specific case; use abduction result — apply a do() intervention with the U nodes held from the abduction step.

Rung 3 — Counterfactual

Would the Ridgeline feeder have failed anyway?

“The Ridgeline feeder failed during last August’s heat dome. We deferred its inspection by 18 months. Would it have failed even if inspection had been on schedule?”

This question conditions on a specific past event — a real feeder, a real storm — and asks what a different decision would have changed. It cannot be answered by comparing deferred feeders to on-schedule feeders in the data, because the feeders that get deferred are already different: they tend to be in constrained-budget corridors where routine maintenance is also lower. The model separates those two effects. It then anchors itself to the background of this specific feeder — the things that were true about it that are not in the data — by entering the actual failure as observed evidence. Only once those background conditions are fixed does it ask: in this specific situation, with this specific feeder, what would the inspection have changed?

The U nodes (shown in orange in the model) represent those unobserved background factors — wildlife contact history, latent insulation weakness, localized corrosion — that are particular to this asset. They persist into the counterfactual world unchanged. This is what makes the answer individual rather than statistical: it is not asking what inspection does on average, but what it would have done here.

Answer

Probably yes — and the inspection would have made a difference, but not enough to prevent it. With only the heat dome entered, failure probability rises to 53.8% — the feeder was already in serious jeopardy before the inspection question is asked. Entering the actual failure as evidence (abduction) shifts Asset Condition to 38.9% Deteriorated and Inspection Schedule infers 41.4% Deferred — the model reconstructs the background of this specific asset from what actually happened. Forcing inspection to On Schedule in the counterfactual world improves Asset Condition to 31.6% Deteriorated — a meaningful 7-point reduction. But Outage Consequence stays at 57.0% Severe, because the feeder's failure is still anchored and the heat dome remains extreme. The honest answer: the inspection gap worsened the asset's condition in a material way, but the underlying heat exposure and the feeder's pre-existing state meant failure was the dominant outcome regardless. The deferral was not the cause — it was a missed chance to reduce a risk that was already severe.

UtilityGridCounterfactual.bayes
ImageObs / DoNodeSetResult
ugr-c-1Feeder Failure36.9% Failed / 31.1% Degraded / 32.0% Intact
Outage Consequence29.9% Severe / 32.3% Moderate / 37.8% Contained
ugr-c-2obsHeat Dome SeverityExtreme
Feeder Fault Probability60.6% High
Feeder Failure53.8% Failed / 27.4% Degraded / 18.8% Intact
Outage Consequence38.1% Severe
ugr-c-3obsHeat Dome SeverityExtreme
obsFeeder FailureFailed
Asset Condition38.9% Deteriorated (from 32.8%)
Inspection Schedule41.4% Deferred (from 39.0%)
Outage Consequence57.0% Severe
ugr-c-4obsHeat Dome SeverityExtreme
obsFeeder FailureFailed
doInspection ScheduleOn Schedule
Asset Condition31.6% Deteriorated (from 38.9%)
Outage Consequence57.0% Severe (unchanged)
Utility Grid Counterfactual — prior, no evidence set
Prior — no evidence set

Fleet-level baseline before any evidence. Feeder Failure: 38.2% Failed, 27.6% Degraded, 34.2% Intact. Outage Consequence: 28.4% Severe.

Rung 2 — Intervention

What does deferring transformer replacement actually cause?

“If we defer the substation transformer replacement program by two years to manage capital spend, what happens to the probability of a forced outage at peak summer load — and how much of that risk is the deferral itself versus the budget environment it comes from?”

When a board observes that utilities with deferred replacement programs have higher outage rates, it is seeing two things at once. Capital budget pressure independently reduces routine maintenance spend — so deferred programs arrive with degraded maintenance quality already baked in. Simply observing deferred programs in the data conflates those two effects. The model separates them: setting the program as deferred via the observation tells you what deferred programs tend to look like; forcing the program to deferred via intervention tells you what deferral alone would cause, holding maintenance quality at whatever it would have been. The gap between those two numbers is the confound the board needs to understand before it decides.

Answer

Deferral itself raises forced outage risk by 9.8 percentage points — from 27.9% to 37.7% High — but the confound from observational data is smaller than intuition suggests. Observing deferral (telling the model a deferral has been recorded) raises High outage risk to 39.5%, because the model infers that budget pressure is probably elevated (59.0% High) and maintenance quality probably reduced (40.0% Reduced) — the conditions that tend to accompany deferral. Forcing deferral via intervention, while holding budget pressure and maintenance at their natural levels, gives 37.7% High: a 1.8-point gap between what the data shows and what deferral alone causes. The confound is real but modest. What is not modest is the interaction with peak load: do(Deferred) under critical summer load pushes High outage risk to 67.3% and Severe regulatory penalty to 56.3%. The right board conversation is therefore less about whether the data overstates the deferral effect, and more about whether this program is being deferred into a summer where the grid cannot absorb the consequence.

UtilityGridIntervention.bayes
ImageObs / DoNodeSetResult
ugr-i-1Forced Outage Risk27.9% High / 33.2% Moderate / 38.9% Low
Regulatory Penalty30.7% Severe / 25.6% Moderate / 43.8% None
ugr-i-2obsTransformer ReplacementDeferred 2yr
Capital Budget Pressure59.0% High (from 30.0%)
Maintenance Quality40.0% Reduced (from 26.3%)
Equipment Condition59.0% Degraded
Forced Outage Risk39.5% High
Regulatory Penalty39.1% Severe
ugr-i-3doTransformer ReplacementDeferred 2yr
Capital Budget Pressure30.0% High (unchanged — confounder severed)
Maintenance Quality26.3% Reduced (unchanged)
Equipment Condition54.2% Degraded
Forced Outage Risk37.7% High
Regulatory Penalty37.9% Severe
ugr-i-4doTransformer ReplacementDeferred 2yr
obsPeak Load StressCritical
Forced Outage Risk67.3% High
Regulatory Penalty56.3% Severe
Utility Grid Intervention — prior, no evidence set
Prior — no evidence set

Fleet baseline before any query. Forced Outage Risk: 30.2% High, 35.8% Moderate, 34.0% Low.

Rung 1 — Association with Causal Structure

What is actually driving elevated fault frequency on this feeder?

“A feeder serving a hospital district is showing elevated fault frequency this quarter. The operations team says it’s worse storms. The asset team says it’s aging infrastructure. Vegetation management says the corridors are overdue. What does the evidence say?”

At Rung 1 the model runs as a filter: enter what you know about this feeder, read which root causes become more probable. Two confounders in the graph — Seasonal Conditions and Maintenance Budget Sufficiency — are what make the diagnostic discriminate between the three teams rather than just updating each cause proportionally. Seasonal Conditions independently drives both Thermal Loading and Vegetation Encroachment: a hot wet season raises both at once. Maintenance Budget Sufficiency independently drives both Equipment Deterioration and Vegetation Encroachment: constrained budgets cut both asset maintenance and corridor clearing simultaneously. Vegetation Encroachment therefore sits at the intersection of both confounders — confirming it as the cause updates beliefs about season and budget, which propagates to the other two causes. Confirming Thermal Loading above capacity updates season strongly but leaves budget near its prior. The two confirmations produce different footprints across the graph — and that difference tells the board whose hypothesis most fits the full picture.

Answer

Elevated fault frequency alone shifts all three causes upward but names no single culprit — and the two additional data points produce genuinely different diagnostic signatures. With only Fault Frequency = Elevated entered, Seasonal Conditions rises to 33.7% Severe, Maintenance Budget Sufficiency to 35.4% Constrained, Vegetation to 45.4% Severe, Thermal to 37.1% Above Capacity, and Equipment to 40.6% Advanced. All three teams remain plausible. Confirming Vegetation = Severe then pulls Seasonal Conditions to 49.9% Severe (raising Thermal to 44.2% Above Capacity) and Maintenance Budget to 44.1% Constrained (raising Equipment to 41.7% Advanced) — because vegetation sits at the intersection of both confounders. Confirming Thermal = Above Capacity instead pulls Seasonal Conditions harder (67.0% Severe) but leaves Maintenance Budget near its prior (30.7% Constrained), raising Vegetation to 54.1% but leaving Equipment only at 34.5% Advanced. The load team's claim points to a seasonal problem; the vegetation team's claim points to both a seasonal and a budget problem. Those are different operational responses — and the model makes the difference explicit before any inspection is commissioned.

UtilityGridDiagnostic.bayes
ImageObs / DoNodeSetResult
ugr-d-1Fault Frequency31.7% Elevated / 31.7% Moderate / 36.6% Low
Service Interruption20.7% Major
Regulatory Exposure24.4% High
ugr-d-2obsFault FrequencyElevated
Seasonal Conditions33.7% Severe (from 20.0%)
Maintenance Budget Sufficiency35.4% Constrained (from 25.0%)
Vegetation Encroachment45.4% Severe
Thermal Loading37.1% Above Capacity
Equipment Deterioration40.6% Advanced
Service Interruption45.0% Major
ugr-d-3obsFault FrequencyElevated
obsVegetation EncroachmentSevere
Seasonal Conditions49.9% Severe
Maintenance Budget Sufficiency44.1% Constrained
Thermal Loading44.2% Above Capacity
Equipment Deterioration41.7% Advanced
Regulatory Exposure43.5% High
ugr-d-4obsFault FrequencyElevated
obsThermal LoadingAbove Capacity
Seasonal Conditions67.0% Severe
Maintenance Budget Sufficiency30.7% Constrained
Vegetation Encroachment54.1% Severe
Equipment Deterioration34.5% Advanced
Regulatory Exposure43.5% High
Utility Grid Diagnostic — prior, no evidence set
Prior — no evidence set

Root causes at prior marginals. Fault Frequency: 29.8% Elevated, 41.6% Moderate, 28.6% Low. Service Interruption: 28.3% Major.

Utility Grid Counterfactual — Ridgeline Feeder (Rung 3)
Full U-node complement. Set obs(Heat Dome = Extreme) + obs(Feeder Failure = Failed) to anchor to the specific event, then do(Inspection = On Schedule) for the counterfactual.
Utility Grid Intervention — Transformer Replacement (Rung 2)
Compare obs(Deferred) vs do(Deferred) to see the confounder effect of Capital Budget Pressure. Add obs(Peak Load = Critical) for the worst-case scenario.
Utility Grid Diagnostic — Feeder Fault Root Cause (Rung 1)
Set obs(Fault Frequency = Elevated) then add vegetation, thermal, or equipment evidence one at a time to see which root cause the data most supports.

All models require Bayes Server (free edition available). See Download Models for the full library across all case studies.

Next Step

Your asset management team knows which feeders are symptomatic and which capital programs are under pressure. That knowledge needs to be in a causal model before the next board capital approval — not discovered in the post-event review.

The models are free. What I provide is the judgment to build the right structure for your specific network, encode your engineers’ knowledge into it, and turn the output into decisions your board can act on. The discipline stays with your team.

info@rung3.ai

This case study is a composite drawn from published utility reliability literature, NERC event analysis reports, and distribution system risk management practice. Specific figures are representative. No individual utility or engagement is described. The Bayes Server models are working files: download, set evidence, and run inference.