CAT Sentinel › Risk Analyzer › Model Observability

Model Observability & Evaluation

Mock-up · synthetic data

Evaluation Store · live

Performance, training-loop health, and AI-output consumption across all 27 CAT Synapse models. Five tiers — drill from a single event to the whole fleet, then read the AI-generated insight layer.

T1 Per-Event Scorecard

T2 Per-Model Drill-Down

T3 Health Check

T4 Training-Loop Monitor

AI CAT AI Model Insights

⌔ Search models & events…

Event

Peril / Severity

Wildfire · HIGH

Event class: Likely

Policyholders served

992

100% autonomous · no human gate

Versions tracked

v1 → v3

3 NWS reissues · C-4

Models engaged

of 27 · wildfire model chain

Prediction vs. observed outcome — per model

Model

Predicted

Observed

Confidence

Error

Reading this: error is the gap between prediction and ground truth from field reports & claims. Green = within tolerance, amber = notable, red = material miss requiring review.

Event calibration point

Predicted CRS 7.4 vs. observed impact 7.1 — within tolerance

Consumption — confidence gate

0.91

mean confidence

HIGH confidence band

Full autonomous response executed — alerts + adjuster pre-positioning, no qualification.

Model

Models healthy

23/27

green status · view ›

Needs attention

amber · drift or accuracy dip · view ›

Circuit-breaker open

M-12 · autonomy suspended · view ›

Pending rollbacks

awaiting confirmation · go to monitor ›

Peril models ranked by false-negative rate — not overall accuracy (spec §3.2)

Fleet status grid — 27 models by category

Healthy Attention Circuit-breaker open · click a model to open its drill-down

Updates · last 30d

per-event · asynchronous

Improved next event

90% positive trajectory

Auto-rolled back

degraded → reverted

Held — low ground truth

thin claims evidence

Weight-delta magnitude per update

Stability check: deltas should stay small and flat. A growing series signals an unstable loop — oscillation alert fires above the dashed line.

Update log — did it help?

Update	Model	Δ mag.	GT quality	Next-event verdict	State

Post-update performance trajectory

For each update: model performance on the next comparable event. The decisive training-health test — a dip after an update triggers automatic rollback.

Governance actions — recent

Auto-rollback executed M-08

2026-05-19 14:22 · update U-7731

Heat Wave DSTCE accuracy fell 4.1% on the next event after a CIL update. Prior validated weights restored automatically.

Circuit-breaker tripped M-12

2026-05-17 09:05 · drift threshold

Special Events DSTCE input drift exceeded limit. Autonomy suspended — predictions still recorded, alerts widened, escalated to human review. Fail-safe, not fail-silent.

Update held M-18

2026-05-16 18:40 · low GT quality

ProxDelta update derived from an event with sparse field reports. Held pending more ground truth rather than applied on thin evidence.

Re-validation passed M-01

2026-05-15 11:00 · periodic benchmark

Wildfire DSTCE re-validated against held-out benchmark independent of the CIL. Slow-drift check clear.

✦

CAT AI Model Insights

An LLM-generated analyst layer that reads the Evaluation Store and explains, in plain language, what the observability data shows for a selected model. Every insight is grounded — it cites the specific metrics behind it. This layer is read-only: it interprets and explains, it does not make decisions or retrain models.

This Model

Health Check

Model

✦

How to read this layer. Insights are generated by a language model from synthetic Evaluation Store data and are illustrative only. The LLM does not gate autonomous actions and is not part of the CIL retraining loop — it is an interpretation aid. Every claim links to its source records; always confirm against the underlying tier before acting.

CAT Sentinel — Model Observability & Evaluation. Mock-up integrated to CAT Sentinel UI standards. All figures are synthetic, for design illustration only. Five tiers: Per-Event Scorecard · Per-Model Drill-Down · Health Check · Training-Loop Monitor · CAT AI Model Insights. Confidential · CAT Sentinel · May 2026.