Agent Observability. Health, Drift, Canary, Pacts

One sentence: Four capabilities from a single data source (the audit trail) that tell you whether your agents are healthy, changing, and following their contract.

Overview

Feature	Question it answers	Endpoint
Health Score	”Is this agent OK right now?”	`GET /agents/{id}/health`
Drift Detection	”Is this agent’s behavior changing?”	`GET /agents/{id}/drift`
Canary Tests	”Does the pipeline still work as expected?”	`POST /canary/{id}/run`
Behavioral Pacts	”Is this agent doing what it’s supposed to?”	`GET /agents/{id}/pact/adherence`
Trust Attestation	”Can I prove this agent is governed?”	`GET /agents/{id}/attestation`

All features derive from the existing audit trail. Zero new instrumentation required.

1. Agent Health Score

How it works

A 0–100 composite score computed from 5 weighted dimensions:

Dimension	Weight	What it measures
Compliance	30%	Block rate (lower = better), classification distribution, policy violations
Data Safety	25%	PII exposure rate, secret detection rate, output scanning
Security	20%	Injection detection rate, escalation attempts, code execution attempts
Stability	15%	Self-consistency (replaced by Pact Adherence if agent has a pact)
Efficiency	10%	Cost per call, token usage

Scoring uses logarithmic penalty curves: a few incidents don’t tank the score, but sustained problems do.

Grades

Grade	Score	Meaning
A	90–95	Excellent: minimal issues
B	80–89	Good: occasional detections, well-governed
C	70–79	Needs attention: elevated block or detection rates
D	60–69	Poor: frequent violations, investigate
F	0–59	Critical: immediate action required

Score caps at 95 (perfection isn’t realistic). No data = null, not 100.

API

# Single agent (last 30 days)
GET /agents/{agent_id}/health?period_days=30

# Fleet overview (all agents, worst-first)
GET /health/overview?period_days=30

Response:

{
  "agent_id": "support-bot",
  "score": 88.9,
  "grade": "B",
  "trend": "improving",
  "trend_delta": 2.3,
  "dimensions": {
    "compliance": { "score": 95.0, "weight": 0.3, "explanation": "Low block rate (2.1%)" },
    "data_safety": { "score": 88.9, "weight": 0.25, "explanation": "PII detected in 11% of calls" },
    "security": { "score": 78.1, "weight": 0.2, "explanation": "3 injection detections" },
    "pact_adherence": { "score": 87.0, "weight": 0.15, "explanation": "Minor classification overshoot" },
    "efficiency": { "score": 95.0, "weight": 0.1, "explanation": "Avg $0.0023/call" }
  },
  "alerts": [
    { "severity": "warning", "dimension": "security", "message": "3 injection detections in period" }
  ],
  "calls_evaluated": 142,
  "period_days": 30,
  "evaluated_at": "2026-03-06T09:00:00Z"
}

Trend detection

The health score computes the same dimensions for the previous period (before the current window) and compares:

Improving: current score > previous score + 2 points
Declining: current score < previous score - 2 points
Stable: within ±2 points

2. Behavioral Drift Detection

How it works

Compares the agent’s current behavior window (last 7 days) against a baseline period (previous 30 days) across 8 signals:

Signal	Method	Weight
Classification	Jensen-Shannon divergence on distribution	20%
Block rate	Absolute + relative change	20%
PII rate	Absolute + relative change	15%
Secret rate	Absolute + relative change	10%
Injection rate	Absolute + relative change	10%
Tool usage	Jaccard distance on feature set	10%
Cost	Relative change in avg cost/call	5%
Model distribution	Distribution distance + new model detection	10%

Drift levels

Level	Score	Meaning
Stable	0–20	Normal variation
Minor	20–40	Some signals shifted: monitor
Significant	40–60	Meaningful behavioral change: investigate
Major	60–100	Fundamentally different behavior: act now

API

# Single agent
GET /agents/{agent_id}/drift?current_days=7&baseline_days=30

# Fleet overview
GET /drift/overview

Response:

{
  "drift_score": 18.5,
  "drift_level": "stable",
  "signals": [
    { "signal": "classification", "drift": 0.12, "description": "JS distance: 0.120" },
    { "signal": "block_rate", "drift": 0.05, "description": "1.2% → 1.8%" },
    { "signal": "model_distribution", "drift": 0.0, "description": "Models: gpt-4o-mini" }
  ],
  "alerts": [],
  "baseline_period": "2026-02-04 to 2026-02-27",
  "current_period": "2026-02-27 to 2026-03-06",
  "baseline_calls": 523,
  "current_calls": 87
}

What triggers drift alerts

Classification shift > 5% in any category
Block rate increase > 5 percentage points
New model appearing (silent provider update)
Cost increase > 50%

3. Canary Tests

How it works

Send known prompts through the full governance pipeline on a schedule. Compare results against a stored baseline to detect:

Silent model weight updates from providers
Policy configuration changes
Detection threshold drift from adaptive tuning
Pipeline step failures

Workflow

# 1. Create a canary test
POST /canary
{
  "id": "clean-prompt",
  "agent_id": "support-bot",
  "prompt": "What are your business hours?",
  "expectations": {
    "classification": "PUBLIC",
    "no_block": true,
    "no_pii": true,
    "max_cost_usd": 0.01
  },
  "schedule_hours": 6
}

# 2. Run it and set the baseline
POST /canary/clean-prompt/baseline

# 3. Run again (checks against baseline + expectations)
POST /canary/clean-prompt/run

# 4. Run all due scheduled canaries
POST /canary/run-due

Regression types

Field	Severity	Example
`blocked` (was passing)	critical	Pipeline config change blocks legitimate traffic
`secrets_detected` (new)	critical	Model leaking secrets it didn’t before
`classification` changed	warning	Model classifying data differently
`pii_detected` (new)	warning	Model generating PII it didn’t before
`injection_detected` (new)	warning	Canary now triggers injection scanner
`model` changed	info	Provider updated model silently
`cost_usd` > 50% change	info	Price change or token usage shift

Expectations vs regressions

Expectations are absolute: “this canary must not be blocked” → hard fail if violated
Regressions are relative: “this changed from the baseline” → only critical regressions fail the test

4. Behavioral Pacts

See Behavioral Pacts guide for the full guide with scenarios.

Quick start

# Set a pact
PUT /agents/support-bot/pact
{
  "purpose": "Answer customer questions about products and orders",
  "expected_classification": "INTERNAL",
  "expected_pii_exposure": "incidental",
  "intended_tools": ["search_products", "lookup_order"],
  "intended_operations": ["read"],
  "expected_cost_per_call_usd": 0.005,
  "expected_block_rate": 0.02,
  "ai_act_risk_level": "limited",
  "gdpr_legal_basis": "legitimate_interest",
  "data_subjects": "customers"
}

# Check adherence
GET /agents/support-bot/pact/adherence?period_days=30

Key design decisions

Pipeline = enforcement (seatbelt): what’s ALLOWED. Hard blocks.
Pact = intent measurement (speed limit). what’s INTENDED. Soft measurement.
No retroactive judgment. when a pact is tightened, old calls are scored against the old pact
Pre-pact calls get a free pass. calls before effective_from are skipped entirely

5. Trust Attestation

How it works

A signed ES256 JWT containing the agent’s governance metrics, verifiable by anyone with the TapPass JWKS public key.

GET /agents/support-bot/attestation?period_days=30

Response:

{
  "attestation": {
    "agent_id": "support-bot",
    "health_score": 88.9,
    "health_grade": "B",
    "compliance_level": "standard",
    "drift_level": "stable",
    "drift_score": 12.3,
    "pact_adherence": 87.0,
    "calls_evaluated": 142,
    "attested_at": "2026-03-06T09:00:00Z",
    "valid_for_seconds": 3600
  },
  "jwt": "eyJhbGciOiJFUzI1NiIs...",
  "verify_at": "/jwks"
}

Compliance tiers

Tier	Requirements
Starter	Any score
Standard	Health ≥ 80 + drift stable or minor
Regulated	Health ≥ 90 + drift stable + pact adherence ≥ 90

Verification

Third parties verify the JWT using TapPass’s JWKS endpoint:

# Get the public key
curl localhost:9620/jwks

# Verify in Python
import jwt, httpx
jwks = httpx.get("https://tappass.example.com/jwks").json()
claims = jwt.decode(token, jwks, algorithms=["ES256"])

6. Discovery Issue Codes

All tool risks, toxic flows, and rug-pull detections carry referenceable issue codes.

Tool risk codes

Code	Severity	What it catches
E001	critical	Tool poisoning: hidden instructions in tool description
T001	medium+	Destructive tool: can modify or destroy data
T002	medium+	Public sink: can send data externally
T003	medium+	Private data access: accesses credentials or personal data
T004	medium	Untrusted content source: ingests external content
T005	high	Forbidden zone access: tool references credential/crypto paths

Toxic flow codes

Code	Severity	Pattern
TF001	high	Private data → public sink (data exfiltration)
TF002	high	Untrusted content → destructive action (confused deputy)
TF003	medium	Untrusted content → public sink (proxy abuse)

Rug pull codes

Code	Severity	What changed
RP001	critical	Tool definition modified between assessments
RP002	medium	New tool appeared on server
RP003	high	Tool removed (possible cover-up)

Dashboard

The TapPass dashboard visualizes all observability data:

Agent list

Health badge per agent: color-coded circle (0–100) + letter grade
Green (A), dark green (B), amber (C), orange (D), red (F)

Agent detail page

Panel	Content
🩺 Health Score	SVG score ring, 5-dimension bars, trend, alerts
📡 Behavioral Drift	Radar chart (8 signals), drift level, signal bars, period info
📋 Behavioral Pact	Purpose, classification, PII exposure, cost budget, AI Act risk, GDPR basis, tools
📐 Pact Adherence	Score ring, violation list with penalties, bar chart, pre-pact skip count

All panels load data from the API on page visit: no polling or websockets needed.

Architecture

                    ┌──────────────────────────────┐
                    │        Audit Trail            │
                    │  (hash-chained, tamper-proof)  │
                    └──────────┬───────────────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
     ┌────────▼──────┐  ┌─────▼──────┐  ┌──────▼──────┐
     │  Health Score  │  │   Drift    │  │   Canary    │
     │  (5 dims)      │  │  (8 sigs)  │  │  (runner)   │
     └────────┬──────┘  └─────┬──────┘  └──────┬──────┘
              │                │                │
     ┌────────▼──────┐  ┌─────▼──────┐  ┌──────▼──────┐
     │     Pact      │  │  Alerts    │  │  Regression │
     │  (adherence)  │  │            │  │  Detection  │
     └────────┬──────┘  └────────────┘  └─────────────┘
              │
     ┌────────▼──────┐
     │  Attestation  │
     │  (signed JWT) │
     └───────────────┘

All modules import from tappass.observability.health._query_agent_events and _extract_calls. a single data extraction layer over the audit trail.

Files

File	Purpose
`tappass/observability/health.py`	Health score computation
`tappass/observability/drift.py`	Drift detection (8 signals, JSD)
`tappass/registry/pact.py`	Pact model, store, adherence scoring
`tappass/canary/store.py`	Canary test definitions + baseline storage
`tappass/canary/runner.py`	Canary execution + regression detection
`tappass/api/routes/agent_health.py`	All 15 API endpoints
`tappass/policy/tokens.py`	Trust attestation in capability tokens
`frontend/src/pages/Agents.tsx`	Dashboard visualizations
`docs/pacts-guide.md`	Behavioral pacts guide
`docs/observability-guide.md`	This file