Skip to content

Agent Observability. Health, Drift, Canary, Pacts

One sentence: Four capabilities from a single data source (the audit trail) that tell you whether your agents are healthy, changing, and following their contract.


FeatureQuestion it answersEndpoint
Health Score”Is this agent OK right now?”GET /agents/{id}/health
Drift Detection”Is this agent’s behavior changing?”GET /agents/{id}/drift
Canary Tests”Does the pipeline still work as expected?”POST /canary/{id}/run
Behavioral Pacts”Is this agent doing what it’s supposed to?”GET /agents/{id}/pact/adherence
Trust Attestation”Can I prove this agent is governed?”GET /agents/{id}/attestation

All features derive from the existing audit trail. Zero new instrumentation required.


A 0–100 composite score computed from 5 weighted dimensions:

DimensionWeightWhat it measures
Compliance30%Block rate (lower = better), classification distribution, policy violations
Data Safety25%PII exposure rate, secret detection rate, output scanning
Security20%Injection detection rate, escalation attempts, code execution attempts
Stability15%Self-consistency (replaced by Pact Adherence if agent has a pact)
Efficiency10%Cost per call, token usage

Scoring uses logarithmic penalty curves: a few incidents don’t tank the score, but sustained problems do.

GradeScoreMeaning
A90–95Excellent: minimal issues
B80–89Good: occasional detections, well-governed
C70–79Needs attention: elevated block or detection rates
D60–69Poor: frequent violations, investigate
F0–59Critical: immediate action required

Score caps at 95 (perfection isn’t realistic). No data = null, not 100.

Terminal window
# Single agent (last 30 days)
GET /agents/{agent_id}/health?period_days=30
# Fleet overview (all agents, worst-first)
GET /health/overview?period_days=30

Response:

{
"agent_id": "support-bot",
"score": 88.9,
"grade": "B",
"trend": "improving",
"trend_delta": 2.3,
"dimensions": {
"compliance": { "score": 95.0, "weight": 0.3, "explanation": "Low block rate (2.1%)" },
"data_safety": { "score": 88.9, "weight": 0.25, "explanation": "PII detected in 11% of calls" },
"security": { "score": 78.1, "weight": 0.2, "explanation": "3 injection detections" },
"pact_adherence": { "score": 87.0, "weight": 0.15, "explanation": "Minor classification overshoot" },
"efficiency": { "score": 95.0, "weight": 0.1, "explanation": "Avg $0.0023/call" }
},
"alerts": [
{ "severity": "warning", "dimension": "security", "message": "3 injection detections in period" }
],
"calls_evaluated": 142,
"period_days": 30,
"evaluated_at": "2026-03-06T09:00:00Z"
}

The health score computes the same dimensions for the previous period (before the current window) and compares:

  • Improving: current score > previous score + 2 points
  • Declining: current score < previous score - 2 points
  • Stable: within ±2 points

Compares the agent’s current behavior window (last 7 days) against a baseline period (previous 30 days) across 8 signals:

SignalMethodWeight
ClassificationJensen-Shannon divergence on distribution20%
Block rateAbsolute + relative change20%
PII rateAbsolute + relative change15%
Secret rateAbsolute + relative change10%
Injection rateAbsolute + relative change10%
Tool usageJaccard distance on feature set10%
CostRelative change in avg cost/call5%
Model distributionDistribution distance + new model detection10%
LevelScoreMeaning
Stable0–20Normal variation
Minor20–40Some signals shifted: monitor
Significant40–60Meaningful behavioral change: investigate
Major60–100Fundamentally different behavior: act now
Terminal window
# Single agent
GET /agents/{agent_id}/drift?current_days=7&baseline_days=30
# Fleet overview
GET /drift/overview

Response:

{
"drift_score": 18.5,
"drift_level": "stable",
"signals": [
{ "signal": "classification", "drift": 0.12, "description": "JS distance: 0.120" },
{ "signal": "block_rate", "drift": 0.05, "description": "1.2% → 1.8%" },
{ "signal": "model_distribution", "drift": 0.0, "description": "Models: gpt-4o-mini" }
],
"alerts": [],
"baseline_period": "2026-02-04 to 2026-02-27",
"current_period": "2026-02-27 to 2026-03-06",
"baseline_calls": 523,
"current_calls": 87
}
  • Classification shift > 5% in any category
  • Block rate increase > 5 percentage points
  • New model appearing (silent provider update)
  • Cost increase > 50%

Send known prompts through the full governance pipeline on a schedule. Compare results against a stored baseline to detect:

  • Silent model weight updates from providers
  • Policy configuration changes
  • Detection threshold drift from adaptive tuning
  • Pipeline step failures
Terminal window
# 1. Create a canary test
POST /canary
{
"id": "clean-prompt",
"agent_id": "support-bot",
"prompt": "What are your business hours?",
"expectations": {
"classification": "PUBLIC",
"no_block": true,
"no_pii": true,
"max_cost_usd": 0.01
},
"schedule_hours": 6
}
# 2. Run it and set the baseline
POST /canary/clean-prompt/baseline
# 3. Run again (checks against baseline + expectations)
POST /canary/clean-prompt/run
# 4. Run all due scheduled canaries
POST /canary/run-due
FieldSeverityExample
blocked (was passing)criticalPipeline config change blocks legitimate traffic
secrets_detected (new)criticalModel leaking secrets it didn’t before
classification changedwarningModel classifying data differently
pii_detected (new)warningModel generating PII it didn’t before
injection_detected (new)warningCanary now triggers injection scanner
model changedinfoProvider updated model silently
cost_usd > 50% changeinfoPrice change or token usage shift
  • Expectations are absolute: “this canary must not be blocked” → hard fail if violated
  • Regressions are relative: “this changed from the baseline” → only critical regressions fail the test

See Behavioral Pacts guide for the full guide with scenarios.

Terminal window
# Set a pact
PUT /agents/support-bot/pact
{
"purpose": "Answer customer questions about products and orders",
"expected_classification": "INTERNAL",
"expected_pii_exposure": "incidental",
"intended_tools": ["search_products", "lookup_order"],
"intended_operations": ["read"],
"expected_cost_per_call_usd": 0.005,
"expected_block_rate": 0.02,
"ai_act_risk_level": "limited",
"gdpr_legal_basis": "legitimate_interest",
"data_subjects": "customers"
}
# Check adherence
GET /agents/support-bot/pact/adherence?period_days=30
  • Pipeline = enforcement (seatbelt): what’s ALLOWED. Hard blocks.
  • Pact = intent measurement (speed limit). what’s INTENDED. Soft measurement.
  • No retroactive judgment. when a pact is tightened, old calls are scored against the old pact
  • Pre-pact calls get a free pass. calls before effective_from are skipped entirely

A signed ES256 JWT containing the agent’s governance metrics, verifiable by anyone with the TapPass JWKS public key.

Terminal window
GET /agents/support-bot/attestation?period_days=30

Response:

{
"attestation": {
"agent_id": "support-bot",
"health_score": 88.9,
"health_grade": "B",
"compliance_level": "standard",
"drift_level": "stable",
"drift_score": 12.3,
"pact_adherence": 87.0,
"calls_evaluated": 142,
"attested_at": "2026-03-06T09:00:00Z",
"valid_for_seconds": 3600
},
"jwt": "eyJhbGciOiJFUzI1NiIs...",
"verify_at": "/jwks"
}
TierRequirements
StarterAny score
StandardHealth ≥ 80 + drift stable or minor
RegulatedHealth ≥ 90 + drift stable + pact adherence ≥ 90

Third parties verify the JWT using TapPass’s JWKS endpoint:

Terminal window
# Get the public key
curl localhost:9620/jwks
# Verify in Python
import jwt, httpx
jwks = httpx.get("https://tappass.example.com/jwks").json()
claims = jwt.decode(token, jwks, algorithms=["ES256"])

All tool risks, toxic flows, and rug-pull detections carry referenceable issue codes.

CodeSeverityWhat it catches
E001criticalTool poisoning: hidden instructions in tool description
T001medium+Destructive tool: can modify or destroy data
T002medium+Public sink: can send data externally
T003medium+Private data access: accesses credentials or personal data
T004mediumUntrusted content source: ingests external content
T005highForbidden zone access: tool references credential/crypto paths
CodeSeverityPattern
TF001highPrivate data → public sink (data exfiltration)
TF002highUntrusted content → destructive action (confused deputy)
TF003mediumUntrusted content → public sink (proxy abuse)
CodeSeverityWhat changed
RP001criticalTool definition modified between assessments
RP002mediumNew tool appeared on server
RP003highTool removed (possible cover-up)

The TapPass dashboard visualizes all observability data:

  • Health badge per agent: color-coded circle (0–100) + letter grade
  • Green (A), dark green (B), amber (C), orange (D), red (F)
PanelContent
🩺 Health ScoreSVG score ring, 5-dimension bars, trend, alerts
📡 Behavioral DriftRadar chart (8 signals), drift level, signal bars, period info
📋 Behavioral PactPurpose, classification, PII exposure, cost budget, AI Act risk, GDPR basis, tools
📐 Pact AdherenceScore ring, violation list with penalties, bar chart, pre-pact skip count

All panels load data from the API on page visit: no polling or websockets needed.


┌──────────────────────────────┐
│ Audit Trail │
│ (hash-chained, tamper-proof) │
└──────────┬───────────────────┘
┌────────────────┼────────────────┐
│ │ │
┌────────▼──────┐ ┌─────▼──────┐ ┌──────▼──────┐
│ Health Score │ │ Drift │ │ Canary │
│ (5 dims) │ │ (8 sigs) │ │ (runner) │
└────────┬──────┘ └─────┬──────┘ └──────┬──────┘
│ │ │
┌────────▼──────┐ ┌─────▼──────┐ ┌──────▼──────┐
│ Pact │ │ Alerts │ │ Regression │
│ (adherence) │ │ │ │ Detection │
└────────┬──────┘ └────────────┘ └─────────────┘
┌────────▼──────┐
│ Attestation │
│ (signed JWT) │
└───────────────┘

All modules import from tappass.observability.health._query_agent_events and _extract_calls. a single data extraction layer over the audit trail.


FilePurpose
tappass/observability/health.pyHealth score computation
tappass/observability/drift.pyDrift detection (8 signals, JSD)
tappass/registry/pact.pyPact model, store, adherence scoring
tappass/canary/store.pyCanary test definitions + baseline storage
tappass/canary/runner.pyCanary execution + regression detection
tappass/api/routes/agent_health.pyAll 15 API endpoints
tappass/policy/tokens.pyTrust attestation in capability tokens
frontend/src/pages/Agents.tsxDashboard visualizations
docs/pacts-guide.mdBehavioral pacts guide
docs/observability-guide.mdThis file