Client Onboarding Playbook
Goal: Take a new client from signed contract to governed AI agents in production.
Timeline: 1 week (pilot), 4 weeks (full rollout)
Audience: Solutions engineer, platform team
Pre-engagement Checklist
Section titled “Pre-engagement Checklist”Before the kickoff call, gather:
- Client’s AI stack: which frameworks (LangChain, CrewAI, custom), which LLMs (OpenAI, Azure, Anthropic)
- Number of agents in scope (start with 1-3 for pilot)
- Compliance requirements: EU AI Act, DORA, NIS2, HIPAA, SOC 2
- Network constraints: can agents reach TapPass? VPN? Air-gapped?
- Existing observability: where do logs go? Splunk, Datadog, ELK?
- Decision maker for “what should be blocked vs. flagged vs. allowed”
Week 1: Pilot
Section titled “Week 1: Pilot”Day 1: Deploy TapPass
Section titled “Day 1: Deploy TapPass”Option A: Client hosts (recommended for regulated)
The production stack runs via Docker Compose with 7 services: TapPass core, OPA, PostgreSQL, Redis, SPIRE (server + agent), and Cloudflare Tunnel.
cd deploy
# Create .env.prod from the templatecp ../.env.example .env.prod# Fill in:# TAPPASS_ADMIN_API_KEY (generate: python -c "import secrets; print('tp_' + secrets.token_urlsafe(32))")# TAPPASS_JWT_SECRET (generate: python -c "import secrets; print(secrets.token_urlsafe(48))")# TAPPASS_VAULT_KEY (generate: python -c "import secrets,base64; print(base64.b64encode(secrets.token_bytes(32)).decode())")# POSTGRES_PASSWORD (generate: python -c "import secrets; print(secrets.token_urlsafe(24))")# SPIRE_JOIN_TOKEN (generate: python -c "import secrets; print(secrets.token_urlsafe(32))")# TAPPASS_LICENSE (from license server)# LLM provider keys (OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.)
docker compose --env-file .env.prod -f docker-compose.prod.yml up -d
# After first start, register SPIRE workload entries:docker compose --env-file .env.prod -f docker-compose.prod.yml exec spire-server \ bash /opt/spire/register-entries.shVerify:
curl -s http://localhost:9620/health | python3 -m json.tool# Should return: {"status": "healthy", "version": "0.5.0", "storage": "local"}What works immediately: Server starts on port 9620, health check passes, OPA loaded, PostgreSQL migrated, SPIRE issuing certs.
What’s rough:
- HTTPS requires Caddy/nginx in front (Caddyfile included in
deploy/) - Or use Cloudflare Tunnel (config in
deploy/setup-tunnel.md)
Option B: Quick start (development/demo)
pip install tappasstappass up --license <key>This walks through storage selection (memory, local PostgreSQL, Supabase) and starts the server interactively.
Option C: Managed by us
Point their agents at https://<client>.tappass.ai. We handle infra.
Day 2: CISO Sets Up Organization
Section titled “Day 2: CISO Sets Up Organization”The CISO runs tappass init interactively, or does it through the dashboard:
tappass initThis walks through:
- Sign in (email/password or Google SSO)
- Create organization
- Create first agent identity
- Select pipeline preset
- Generate
.envwith agent credentials
Or via the API:
# Create org + first agentcurl -X POST http://localhost:9620/auth/signup \ -H "Content-Type: application/json" \ -d '{"email": "ciso@client.com", "password": "...", "org_name": "Client Corp"}'Day 3: Configure Pipeline
Section titled “Day 3: Configure Pipeline”Choose the starting preset based on industry:
| Industry | Starting preset | Steps |
|---|---|---|
| Financial services | regulated | 45 steps (full pipeline with DORA audit, PCI patterns) |
| Healthcare | regulated | 45 steps (PHI detection, EU model routing) |
| Legal | regulated | 45 steps (privilege detection, matter isolation) |
| Government | regulated | 45 steps (classification levels, air-gap support) |
| SaaS | standard | 38 steps (multi-tenant isolation) |
| Consulting | standard | 38 steps (client data boundaries) |
| Startups | starter | 11 steps (minimal, observe mode) |
Pipeline is configured per-agent via the CLI or API:
# View current pipelinetappass gov pipelines
# View a specific pipeline's stepstappass gov pipeline default
# Assign a preset to an agenttappass gov assign <agent-id> --pipeline regulatedOr with a YAML override file:
preset: regulatedoverrides: classification: custom_patterns: - name: client_account_number pattern: "ACC-[0-9]{8}" level: CONFIDENTIAL policy: require_human_approval: - classification: RESTRICTED - classification: TOP_SECRETDay 4: Integrate First Agent
Section titled “Day 4: Integrate First Agent”Walk their developer through the SDK:
pip install tappassfrom tappass import Agent
agent = Agent( "https://tappass.client.internal/v1", name="claims-processor", api_key="tp_...", flags={ "pii": "mask", "mode": "observe", # Start in observe, switch to enforce later })
# Their existing OpenAI code works unchangedresponse = agent.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}])Day 4-5: Observe Mode
Section titled “Day 4-5: Observe Mode”Keep mode=observe for the first week. This means:
- All 45 pipeline steps run on every call (for
regulated) - Detections are logged but not blocked
- Audit trail fills up with real data
- Dashboard shows what would have been blocked
Review meeting at end of week:
- Show the dashboard: “Here’s what your agents did this week”
- Show the audit trail:
tappass logs --limit 50 - Highlight risky calls that would have been blocked
- Agree on what should be blocked vs. flagged vs. allowed
- Set thresholds together with the CISO
Week 2-4: Full Rollout
Section titled “Week 2-4: Full Rollout”Switch to Enforce Mode
Section titled “Switch to Enforce Mode”agent = Agent( url, flags={"mode": "enforce"} # Now blocks violations)Or via environment variable:
export TAPPASS_FLAGS="mode=enforce"Add Remaining Agents
Section titled “Add Remaining Agents”For each agent:
- Register via CLI:
tappass agents add --name "research-bot" --preset standard - Assign capability token scopes (which tools, which data levels)
- Set pipeline overrides if needed
- Deploy SDK integration
- Verify in observe mode for 24h, then switch to enforce
Configure SIEM Export
Section titled “Configure SIEM Export”TapPass supports three SIEM formats: CEF (Splunk/ArcSight), OCSF (AWS Security Lake/CrowdStrike), and JSON (generic webhook).
Configure via the API:
curl -X PUT http://localhost:9620/admin/siem/config \ -H "Authorization: Bearer $ADMIN_KEY" \ -H "Content-Type: application/json" \ -d '{ "enabled": true, "destination": { "type": "splunk_hec", "url": "https://client-splunk.internal:8088/services/collector", "token": "..." }, "format": "cef", "severity_filter": "detection" }'Severity filter options: all, detection (something detected), action (action taken), block (only blocks).
Configure Alerting
Section titled “Configure Alerting”curl -X POST http://localhost:9620/admin/webhooks \ -H "Authorization: Bearer $ADMIN_KEY" \ -d '{ "url": "https://client-slack.webhook.url", "events": ["violation", "escalation", "anomaly"] }'Run Status Check
Section titled “Run Status Check”Before handover, run the full diagnostic:
tappass statusThis checks: config, server connection, auth, agents, pipeline, database, OPA, SPIRE, license.
Handover
Section titled “Handover”Deliver to the client:
- Dashboard access for CISO + security team
- Runbook: “What to do when TapPass blocks something”
- Pipeline config (version controlled in their repo)
- Monitoring:
/health/readyendpoint added to their uptime tool - SIEM integration verified (events flowing)
- Escalation path: who to call if governance needs tuning
Common Gotchas
Section titled “Common Gotchas”| Problem | Solution |
|---|---|
| Agent timeouts after adding TapPass | Check per-step latency via /admin/metrics. starter adds ~20ms, regulated ~100ms. If llm_judge is enabled, that’s an extra ~200ms per call. |
| Too many false positives on PII | Tune PII allowlist. Product names that look like person names are common (“Anna Router”, “Iris Scanner”). Add to pii.allowlist in pipeline config. |
| Developer pushback (“it’s slowing us down”) | Start with mode=observe. Show the dashboard. Let the data speak. No enforcement until the CISO sees what’s happening. |
| CISO wants everything blocked immediately | Push back. Observe first, enforce second. Blocking without data causes outages and developer revolt. |
| Streaming responses break | Use agent.chat.completions.create(stream=True). The SDK handles governed streaming natively. |
| Multiple LLM providers | TapPass proxies to all. Configure providers via environment variables (see .env.example). One agent can use multiple models. |
| ”tappass up” hangs | Check if port 9620 is already in use. Run tappass status for diagnostics. |
| Database migration fails | Migrations are in deploy/migrations/. Currently 12 migration files. Check docker compose logs postgres for errors. |
| SPIRE not issuing certs | Forgot to run register-entries.sh after first deploy. Check with docker compose exec spire-server /opt/spire/bin/spire-server entry show. |