Asset reliability management has a fundamental measurement problem: the successes are invisible. A predictive maintenance program that detects a developing fault and triggers early intervention prevents a failure that never appears in the data. The board sees the cost of the program and the count of failures that occurred. It cannot see the failures that did not occur. Standard practice — comparing failure counts before and after, or against a benchmark fleet — contains a structural error: the fleets that adopt predictive maintenance tend to be higher-risk fleets, so any comparison conflates the effect of the program with the background risk that motivated the investment in the first place.

The same problem appears in interval decisions. When a board observes that fleets running extended overhaul intervals have higher failure rates, it is seeing two effects: the interval extension itself, and the elevated wear rates that tend to accompany the cost pressures that produce interval extensions. Separating those two effects requires a causal model that holds the confounder — fleet utilization pressure — at its natural level while forcing the interval to change.

Analysis ComponentStandard ApproachCausal Approach
Program evaluationCount failures that occurredEstimate failures that would have occurred without the program
Interval extension effectCorrelated with failure rate (confounded by utilization pressure)Isolated via do() — utilization pressure held at prior
Root cause of rising outage rateEach team cites their metricPosterior distribution over causes given the full evidence pattern
PdM selection biasNot in model — high-risk fleets compared to low-risk benchmarksAsset Risk Profile as explicit confounder
Counterfactual failures preventedNot computableEstimated from U-node-anchored individual-level model
The board approved the predictive maintenance program based on projected failure reductions. The only way to know whether it delivered is to estimate the failures that did not happen — and that requires a causal model, not a dashboard.
3 Questions, 3 Rungs
  1. We spent $8M on a predictive maintenance program and still had three unplanned outages — would those outages have occurred without the program? — Rung 3 (Counterfactual). Answering it requires abduction to anchor this specific fleet’s background failure risk via U nodes before removing the PdM program; without U node anchoring the counterfactual reflects a population average fleet, not this one.
  2. If we extend gas turbine overhaul intervals from 24 to 36 months to reduce capex, what does that actually cause to the probability of failure before the next planned intervention? — Rung 2 (Intervention). A do() query severs the confounding path from Fleet Utilization Pressure through the interval decision, isolating the causal effect of the extension from the operational environment that tends to drive both utilization and maintenance deferrals simultaneously.
  3. Our forced outage rate has risen 18% over three years — engineering says fleet aging, operations says increased load, finance says deferred maintenance. What does the asset data actually support? — Rung 1 (Association). The graph encodes which dependencies exist between Asset Age, Load Factor, Maintenance Spend, and outage outcomes; entering the observed trend updates every genuinely connected upstream node, separating what the data supports from what each team is asserting.

Reading the screenshots: a black check mark on a node means it has been set as observed evidence — a fact entered into the model, acting as a filter. A red check mark means it has been set as a do intervention — a decision applied to the model, severing the influence of its parents.

Reading the spec tables: each Run the Analysis block lists the exact steps to reproduce each screenshot in Bayes Server. The Obs / Do column uses three italic control tokens: clear — reset the model to a blank no-evidence state; abduction step — enter the factual observations that anchor the U nodes to this specific case; use abduction result — apply a do() intervention with the U nodes held from the abduction step.

Rung 3 — Counterfactual

Did the program prevent failures we cannot see?

“We spent $8M on a predictive maintenance program last year. We still had three unplanned outages. Would those outages have occurred without the program — or did it actually prevent worse failures that never appear in the data?”

This question conditions on what actually happened — three outages with PdM running — and asks what would have occurred in a world where the program had not been deployed. It cannot be answered by comparing this fleet to a non-PdM benchmark, because fleets that invest in predictive maintenance tend to be higher-risk fleets. The Asset Risk Profile confounder in the model captures this: high-risk fleets adopt PdM precisely because they fail more, so any naive comparison overstates the program's benefit. The model separates those two effects.

The U nodes (shown in orange) represent the unobserved background of this specific fleet — sensor placement quality, analyst expertise, the particular fault modes present this year, crew positioning when failures occurred. By entering the three observed outages as evidence, the model infers what those background factors must have been for this fleet in this year. It then holds those background factors fixed and asks: with everything else about this fleet unchanged, what would the failure count have been without the program?

Answer

The counterfactual comparison shows that removing PdM from this fleet's actual background has negligible effect on Outage Consequence — but that is because the U nodes are also anchored to the observed failure state. Step 2 (obs PdM=Not Implemented + Failure=Three to four) anchors the model: EFDR drops to 63.9% Low, Asset Risk shifts only modestly (28.8/55.4/15.8%), and Outage Consequence shifts to 38.5% Severe / 17.0% Contained. Step 3 shows the fleet without PdM evidence but holding the same failure event: EFDR rises to 47.7% Low vs 63.9% — confirming that the program's direct causal contribution to detection is real. Step 4, do(PdM=Implemented), severs the Asset Risk back-door: Asset Risk jumps to 47.4% High-risk (the model now correctly infers this is a high-risk fleet — only such fleets adopt PdM), EFDR improves to 46.5% High, and the failure distribution returns close to prior (24.3/34.8/30.2/10.7%). The core counterfactual insight: without PdM, this fleet's EFDR falls to 63.9% Low, which is the pathway through which unplanned failures compound.

AssetReliabilityCounterfactual.bayes
ImageObs / DoNodeSetResult
ar-c-1-priorPdM Program36.4% Implemented / 63.6% Not Implemented
Early Fault Detection Rate22.6% High / 31.4% Moderate / 45.9% Low
Unplanned Failure Events24.7% Five+ / 35.1% Three-four / 29.8% One-two / 10.4% None
Outage Consequence35.3% Severe / 37.0% Moderate / 27.7% Contained
ar-c-2obsPdM ProgramNot Implemented
obsUnplanned Failure EventsThree to fourAbduction — fleet background anchored
Asset Risk Profile28.8% High / 55.4% Moderate / 15.8% Low
Early Fault Detection Rate7.5% High / 28.7% Moderate / 63.9% Low
Outage Consequence38.5% Severe / 44.5% Moderate / 17.0% Contained
ar-c-3obsUnplanned Failure EventsThree to fourFailure only — PdM not set (compare to step 2)
Asset Risk Profile30.5% High / 54.0% Moderate / 15.5% Low
Early Fault Detection Rate20.2% High / 32.1% Moderate / 47.7% Low
Outage Consequence38.5% Severe / 44.5% Moderate / 17.0% Contained
ar-c-4doPdM ProgramImplementedSevers Asset Risk back-door
Asset Risk Profile47.4% High — correct inference: only high-risk fleets adopt PdM
Early Fault Detection Rate46.5% High / 36.5% Moderate / 17.0% Low
Unplanned Failure Events24.3% Five+ / 34.8% Three-four / 30.2% One-two / 10.7% None
Outage Consequence35.0% Severe / 37.0% Moderate / 28.0% Contained
Asset Reliability Counterfactual — prior, no evidence set
Prior — no evidence set

Fleet-level baseline. PdM 36.4% Implemented / 63.6% Not Implemented. EFDR 45.9% Low / 22.6% High. Failure Events: 24.7% Five+ / 35.1% Three-four. Outage Consequence 35.3% Severe.

Rung 2 — Intervention

What does extending the overhaul interval actually cause?

“If we extend the overhaul interval on our gas turbines from 24 to 36 months to reduce capex, what does that actually do to the probability of failure before the next planned intervention?”

When a board observes that fleets running extended intervals have higher failure rates, it is seeing two effects at once. Fleet Utilization Pressure independently causes both the decision to extend the interval and the wear rate on the assets — fleets under cost pressure run harder and maintain less. Observing an extended interval tells the model that utilization pressure is probably high, which raises wear rate, which worsens component condition. Forcing the interval to extended via intervention holds utilization pressure at its natural level and asks only what the longer gap between overhauls causes, separate from the operating environment that tends to accompany that decision. The gap between those two numbers is what the board needs before it approves the deferral.

Answer

do(Overhaul = Extended) raises High failure probability from 34.6% to 49.1% — a 14.5-point increase from the interval decision alone, with Fleet Utilization Pressure held at its prior. obs(Overhaul = Extended) produces 51.8% High failure — the 2.7-point gap above do() is the confounding bias: observing an extended interval updates FUP toward High (58.6% vs 30.0% prior), which further elevates wear rate and component condition. Add obs(Operating Environment = Severe) to the do() and failure probability rises to 53.4% High — the environment interaction adds another 4.3 points on top of the interval causal effect. Unplanned Outage Major rises from 33.8% at prior to 43.0% under do(Extended) alone and 45.5% under do(Extended) + Severe. The board is not deciding whether to extend the interval in the abstract — it is deciding whether to extend it on assets already exposed to severe conditions, where the effects compound.

AssetReliabilityIntervention.bayes
ImageObs / DoNodeSetResult
ar-i-1-priorFleet Utilization Pressure30.0% High / 45.0% Moderate / 25.0% Low
Overhaul Interval28.1% Extended / 50.4% Standard / 21.5% Accelerated
Component Condition31.6% Degraded / 36.8% Fair / 31.6% Good
Failure Probability34.6% High / 33.3% Moderate / 32.1% Low
Unplanned Outage33.8% Major / 31.0% Partial / 35.2% Avoided
SAIDI Contribution32.2% Above / 28.2% Near / 39.5% Within
ar-i-3doOverhaul IntervalExtendedFUP stays at prior — back-door severed
Fleet Utilization Pressure30.0% High / 45.0% Moderate / 25.0% Low — unchanged
Component Condition54.8% Degraded / 31.9% Fair / 13.2% Good
Failure Probability49.1% High / 31.3% Moderate / 19.6% Low
Unplanned Outage43.0% Major / 30.6% Partial / 26.4% Avoided
SAIDI Contribution38.5% Above / 28.4% Near / 33.2% Within
ar-i-4obsOverhaul IntervalExtendedCompare to do() — back-door open
Fleet Utilization Pressure58.6% High — up from 30.0%; confound flows back
Component Condition59.9% Degraded — higher than do()
Failure Probability51.8% High — 2.7pp above do(); confounding bias
Unplanned Outage44.5% Major / 25.1% Avoided
ar-i-2doOverhaul IntervalExtended+ environment stress
obsOperating EnvironmentSevere
Wear Rate49.1% Accelerated — environment drives wear
Failure Probability53.4% High — 4.3pp above do() alone
Unplanned Outage45.5% Major / 24.3% Avoided
SAIDI Contribution40.1% Above Threshold
Asset Reliability Intervention — prior, no evidence set
Prior — no evidence set

Fleet baseline. FUP 30% High / 45% Moderate. Overhaul 28.1% Extended / 50.4% Standard. Failure Probability 34.6% High / 32.1% Low. Outage 33.8% Major / 35.2% Avoided. SAIDI 32.2% Above Threshold.

Rung 1 — Association with Causal Structure

Which team is right about the rising outage rate?

“Our forced outage rate has risen 18% over three years. Engineering says it’s fleet aging. Operations says it’s increased load. Finance says it’s deferred maintenance. What does the asset data actually support?”

At Rung 1 the model runs as a filter: enter what the data shows, read which root causes become more probable. Two confounders in the graph make the diagnostic discriminate between the three teams rather than updating each cause proportionally. Asset Program Maturity independently drives both Fleet Age Profile and Maintenance Deferrals — confirming Significant deferrals updates program maturity, which also shifts fleet age beliefs. Grid Demand Growth independently drives both Load Stress and Fleet Age Profile — confirming Above Rating load updates demand growth, which shifts fleet age beliefs through a different mechanism. Each team's evidence produces a different footprint across the graph, and those footprints tell the board whose hypothesis most fits the full pattern of evidence when all three data points are available.

Answer

The elevated outage rate alone cannot name a single cause — but the three teams' evidence produces meaningfully different diagnostic signatures, and the combination points most strongly to fleet aging amplified by demand growth. With only the elevated outage rate entered, all three root causes update modestly upward and no team's hypothesis dominates. Confirming Above Rating load pulls Grid Demand Growth sharply upward, which also shifts Fleet Age Profile — the operations team's evidence implicates engineering's cause through the shared confounder. Confirming Significant maintenance deferrals pulls Asset Program Maturity toward Mature, which shifts both Fleet Age and Deferrals upward — finance's evidence implicates engineering through a different route. The model does not resolve the debate; it shows that all three teams are describing the same underlying pressure through different observational windows, and that the right intervention is at the confounder level — program renewal and load growth management — rather than addressing any single symptom in isolation.

AssetReliabilityDiagnostic.bayes
ImageObs / DoNodeSetResult
ar-d-1-priorForced Outage Rate
Financial Impact
ar-d-2-elevatedobsForced Outage RateElevated
Fleet Age Profile
Load Stress
Maintenance Deferrals
ar-d-3-loadobsForced Outage RateElevated
obsLoad StressAbove Rating
Grid Demand Growth
Fleet Age Profile
ar-d-4-deferralsobsForced Outage RateElevated
obsMaintenance DeferralsSignificant
Asset Program Maturity
Fleet Age Profile
Asset Reliability Diagnostic — prior, no evidence set
Prior — no evidence set

Root causes at fleet prior. Forced Outage Rate, Fleet Age, Load Stress, and Maintenance Deferrals at their baseline marginals.

Asset Reliability Counterfactual — PdM Program (Rung 3)
Full U-node complement. Set obs(PdM = Implemented) + obs(Failure Events = Three to four) to anchor to your fleet's actual year, then do(PdM = Not Implemented) for the counterfactual.
Asset Reliability Intervention — Overhaul Interval (Rung 2)
Compare obs(Extended) vs do(Extended) to see the Fleet Utilization Pressure confound. Add obs(Operating Environment = Severe) for the compounded worst case.
Asset Reliability Diagnostic — Forced Outage Rate (Rung 1)
Set obs(Forced Outage Rate = Elevated) then add load, maintenance, or age evidence one at a time to read which root cause the data most supports.

All models require Bayes Server (free edition available). See Download Models for the full library across all case studies.

Next Step

Your reliability engineers know which assets are symptomatic and which programs are under pressure. That knowledge needs to be in a causal model before the next capital allocation cycle — not discovered after the overhaul deferral produces a forced outage at peak demand.

The models are free. What I provide is the judgment to build the right structure for your specific fleet, encode your engineers’ knowledge into it, and turn the output into decisions your board can act on. The discipline stays with your team.

info@rung3.ai

This case study is a composite drawn from published asset reliability literature, utility and industrial maintenance program reviews, and infrastructure capital planning practice. Specific figures are representative. No individual organization or engagement is described. The Bayes Server models are working files: download, set evidence, and run inference.