TapPass Security Testing Roadmap
Status as of March 2026. Post Round 12
What We’ve Done vs. What’s Left
Section titled “What We’ve Done vs. What’s Left” DONE │ TODO │ ✅ Functional security (R1–R12, 300+ tests) │ ❌ Penetration testing (OWASP API) ✅ PII detection (24 obfuscation techniques) │ ❌ Load / stress testing ✅ Injection detection (threshold tuned) │ ❌ SAST (static analysis) ✅ Output scanning (normalizer, decoder) │ ❌ SCA (dependency CVE scan) ✅ Role coverage (6 roles) │ ❌ DAST (automated vuln scanning) ✅ Multimodal content support │ ❌ Fuzzing ✅ ReDoS resistance (verified <2s) │ ❌ Chaos / resilience testing ✅ Rate limiting (basic) │ ❌ Infrastructure hardening ✅ Auth enforcement │ ❌ Compliance audit (SOC2/GDPR) ✅ Security headers middleware │ ❌ Secrets rotation & management ✅ Error sanitization (no stack traces) │ ❌ Streaming response scanning ✅ Basic edge cases (null, empty, large) │ ❌ Concurrent race conditions │ ❌ CI security gates1. PENETRATION TEST (P0. Do First)
Section titled “1. PENETRATION TEST (P0. Do First)”OWASP API Security Top 10 (2023) test against the /v1/chat/completions endpoint and all admin routes.
We’ve tested the governance pipeline extensively but NOT the web application layer. FastAPI routes, auth flows, admin APIs, CORS, SSRF, etc.
| OWASP API # | Risk | TapPass Surface | Status |
|---|---|---|---|
| API1: Broken Object Level Auth | Can agent A read agent B’s config? | /v1/agents/{id}, /v1/pipelines/{id} | ❌ Untested |
| API2: Broken Authentication | JWT bypass, token reuse, key bruteforce | /v1/auth/*, Bearer tokens | ⚠️ Basic only |
| API3: Broken Object Property Auth | Can user modify admin-only fields? | Pipeline config, rate limit overrides | ❌ Untested |
| API4: Unrestricted Resource Consumption | Memory exhaustion, CPU spike | Large payloads, many connections | ❌ Untested |
| API5: Broken Function Level Auth | Admin routes accessible to agents? | /v1/admin/*, /v1/pipelines/create | ❌ Untested |
| API6: Unrestricted Access to Sensitive Business Flows | Abuse governance bypass | Repeated block attempts, policy enum | ❌ Untested |
| API7: Server Side Request Forgery | Can LLM base_url be SSRF vector? | base_url config, webhook URLs | ❌ Untested |
| API8: Security Misconfiguration | Headers, CORS, debug mode, verbose errors | All endpoints | ⚠️ Headers exist, CORS unchecked |
| API9: Improper Inventory Management | Shadow/deprecated endpoints exposed? | All routes | ❌ Untested |
| API10: Unsafe API Consumption | Trusting upstream LLM responses blindly? | OpenAI response parsing | ⚠️ Output scanner exists |
# Option A: Manual with toolspip install httpie# Test BOLAhttp GET localhost:9620/v1/agents/OTHER_AGENT_ID Authorization:"Bearer agent_a_token"# Test BFLAhttp POST localhost:9620/v1/admin/pipelines Authorization:"Bearer regular_user_token"# Test SSRF# Set base_url to http://169.254.169.254/latest/meta-data/ (AWS metadata)
# Option B: Automated# Use OWASP ZAP or Burp Suite against localhost:9620docker run -t ghcr.io/zaproxy/zaproxy:stable zap-api-scan.py \ -t http://host.docker.internal:9620/openapi.json -f openapi
# Option C: Hire external pentester (recommended for enterprise)# Budget: €3,000–8,000 for a 3–5 day engagementSpecific Tests Needed
Section titled “Specific Tests Needed”- JWT token forgery (alg=none, key confusion)
- Admin API key enumeration / timing attack
- IDOR on agent/pipeline CRUD
- SSRF via
base_urlor webhook URL - Path traversal on any file-serving routes
- HTTP verb tampering (GET vs POST vs PUT vs DELETE)
- Content-Type confusion (XML, form-data, etc.)
- Request smuggling (HTTP/1.1 vs HTTP/2)
2. LOAD & STRESS TESTING (P0: Do First)
Section titled “2. LOAD & STRESS TESTING (P0: Do First)”Verify TapPass doesn’t degrade security under load. Does the PII scanner skip checks when overloaded? Do rate limits hold?
Security under load is different from security at rest. Many systems silently skip expensive checks (Presidio NER, regex normalizer) when response time budgets are exceeded.
| Test | Tool | Target | Pass Criteria |
|---|---|---|---|
| Sustained load | k6 / Locust | 100 req/s for 5 min | No 5xx, p99 < 5s, PII still detected |
| Spike test | k6 | 0 → 500 req/s instant | Rate limiter engages, no crash |
| Soak test | k6 | 50 req/s for 1 hour | No memory leak, stable latency |
| PII under load | Custom | 100 req/s all with SSN | 100% detection rate maintained |
| Large payload flood | k6 | 49KB messages at 50 req/s | No OOM, all scanned |
| Concurrent same-agent | k6 | 200 concurrent from 1 agent | Rate limiter per-agent works |
| Connection exhaustion | Custom | 10,000 half-open connections | Server stays responsive |
| Slow loris | slowhttptest | Slow headers/body | Uvicorn timeout works |
# Install k6brew install k6
# Basic load testcat > load-test.js << 'EOF'import http from 'k6/http';import { check } from 'k6';
export const options = { stages: [ { duration: '30s', target: 50 }, { duration: '2m', target: 100 }, { duration: '30s', target: 0 }, ],};
const API = 'http://localhost:9620/v1/chat/completions';const HEADERS = { 'Content-Type': 'application/json', 'Authorization': `Bearer ${__ENV.TAPPASS_API_KEY}`,};
export default function () { / Mix of clean + PII requests const hasPII = Math.random() > 0.5; const body = JSON.stringify({ model: 'gpt-4o-mini', messages: [{ role: 'user', content: hasPII ? 'SSN: 234-56-7890' : 'What is 2+2?' }], max_tokens: 20, });
const res = http.post(API, body, { headers: HEADERS, timeout: '30s' });
check(res, { 'status is 200': (r) => r.status === 200, 'PII detected when sent': (r) => { if (!hasPII) return true; const tappass = JSON.parse(r.body).tappass || {}; return (tappass.steps || []).some(s => s.step === 'detect_pii' && s.detected); }, });}EOFk6 run -e TAPPASS_API_KEY=$TAPPASS_ADMIN_API_KEY load-test.jsCritical Question
Section titled “Critical Question”Does PII detection rate drop below 100% under load?
If yes, we need circuit breakers that BLOCK requests when the scanner is overloaded, not skip scanning.
3. STATIC APPLICATION SECURITY TESTING: SAST (P1)
Section titled “3. STATIC APPLICATION SECURITY TESTING: SAST (P1)”Automated code scanning for vulnerabilities in the Python source code.
175 Python files. Manual review caught pipeline bugs, but didn’t systematically check for SQL injection, path traversal, insecure deserialization, hardcoded secrets, etc.
| Tool | What It Finds | Cost |
|---|---|---|
| Bandit | Python-specific security issues (eval, exec, hardcoded passwords, weak crypto) | Free |
| Semgrep | Pattern-based bugs (OWASP rules, custom rules) | Free (OSS) / Paid (Cloud) |
| CodeQL | Deep dataflow analysis (taint tracking) | Free for OSS on GitHub |
| Ruff | Fast linter with some security rules | Free |
# Bandit: Python security linterpip install banditbandit -r tappass/ -ll -ii # medium+ severity, medium+ confidence
# Semgrep. pattern-based scanner with OWASP rulespip install semgrepsemgrep --config=auto tappass/semgrep --config=p/owasp-top-ten tappass/semgrep --config=p/python tappass/
# Both should run in CI (add to .github/workflows/ci.yml)Known Issues to Check
Section titled “Known Issues to Check”-
eval()/exec()anywhere in codebase -
yaml.load()withoutSafeLoader -
pickle.loads()on untrusted data - Hardcoded secrets in source (not just
.env) -
os.path.join()with user input (path traversal) -
subprocesswithshell=True - Insecure random (
randomvssecrets) - Missing
httponly/secureon cookies (if any)
4. SOFTWARE COMPOSITION ANALYSIS. SCA (P1)
Section titled “4. SOFTWARE COMPOSITION ANALYSIS. SCA (P1)”Scan all Python dependencies for known CVEs.
TapPass has ~25 direct dependencies including cryptography, pyjwt, httpx, litellm, spacy. all attack surface. One vulnerable transitive dependency = compromised.
# pip-audit (official PyPA tool)pip install pip-auditpip-audit -r requirements.txt
# Safety (alternative)pip install safetysafety check -r requirements.txt
# Trivy (also scans Docker images)brew install trivytrivy fs --scanners vuln .trivy image tappass:latest
# GitHub Dependabot. enable in repo settings (free)# Snyk. more detailed, integrates with CI (free tier)What We Already Know
Section titled “What We Already Know”.envis world-readable (644) → should be 600OPENAI_API_KEYis in.env(known blocker C2)- Dependency versions use
>=(floor only, no ceiling) → could pull vulnerable newer versions
5. DYNAMIC APPLICATION SECURITY TESTING. DAST (P2)
Section titled “5. DYNAMIC APPLICATION SECURITY TESTING. DAST (P2)”Black-box automated vulnerability scanning against the running API.
Tests the actual deployed surface. finds misconfigurations, missing headers, unexpected endpoints, parameter pollution.
# OWASP ZAP (free, excellent for APIs)docker run -t ghcr.io/zaproxy/zaproxy:stable zap-api-scan.py \ -t http://host.docker.internal:9620/openapi.json \ -f openapi -r report.html
# Nuclei (fast, template-based)brew install nucleinuclei -u http://localhost:9620 -t http/ -t exposures/ -t misconfiguration/
# Nikto (web server scanner)nikto -h http://localhost:9620Specific Checks
Section titled “Specific Checks”- OpenAPI spec exposed? (
/docs,/redoc,/openapi.json) - Debug mode enabled? (
/debug,/__debug__) - Admin routes without auth?
- CORS misconfiguration (wildcard
*?) - Missing security headers on all routes
- Information disclosure in error responses
6. FUZZING (P2)
Section titled “6. FUZZING (P2)”Send malformed, random, and boundary-case inputs to every API parameter.
Our tests use reasonable inputs. Fuzzers find crashes from inputs no human would think of. malformed JSON, binary data in string fields, extreme unicode, etc.
# Schemathesis. OpenAPI-aware fuzzerpip install schemathesisschemathesis run http://localhost:9620/openapi.json \ --base-url http://localhost:9620 \ --hypothesis-max-examples=1000 \ -H "Authorization: Bearer $TAPPASS_ADMIN_API_KEY"
# RESTler. Microsoft's stateful REST API fuzzer# https://github.com/microsoft/restler-fuzzer
# Custom fuzzer for PII patternspython3 -c "import random, string# Generate random strings that ALMOST match SSN/email/CC patterns# to find regex edge casesfor _ in range(10000): # Random digit groups a,b,c = random.randint(0,999), random.randint(0,99), random.randint(0,9999) sep = random.choice(['-','.','/','|','_',' ','\t']) print(f'{a:03d}{sep}{b:02d}{sep}{c:04d}')"Targets
Section titled “Targets”-
messages[].content: arbitrary strings, binary, nulls -
messages[].role. unknown roles, empty, very long -
model. SQL injection, path traversal, command injection -
max_tokens. negative, zero, MAX_INT, float, string -
temperature. extreme values - Headers. very long, duplicate, conflicting
- Body. malformed JSON, XML, form-data
- Encoding. invalid UTF-8, mixed encodings
7. CHAOS & RESILIENCE TESTING (P2)
Section titled “7. CHAOS & RESILIENCE TESTING (P2)”What happens when components fail? Does security degrade gracefully or fail open?
Critical Questions
Section titled “Critical Questions”| Scenario | Expected Behavior | Risk if Wrong |
|---|---|---|
| OPA goes down | Block all requests (fail closed) | Policies bypassed |
| Presidio/spaCy crashes | Block requests with PII-like content | PII leaks through |
| OpenAI API timeout | Return error, don’t leak partial response | Partial data exposure |
| Redis/store unavailable | Rate limits unenforced? | DoS possible |
| Disk full | Logs stop, but scanning continues | Audit trail gap |
| Out of memory | Process killed, restarts clean | Stale state |
# Kill OPA and send PII requestpkill -f opacurl -X POST localhost:9620/v1/chat/completions \ -H "Authorization: Bearer $KEY" \ -H "Content-Type: application/json" \ -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"SSN 234-56-7890"}]}'# MUST return error, not forward unscanned
# Simulate slow LLM (add 30s delay)# Does the output scanner still run on timeout responses?
# Memory pressurestress --vm 2 --vm-bytes 4G --timeout 60s &# Send PII request during memory pressureThe Golden Rule
Section titled “The Golden Rule”Every failure mode must FAIL CLOSED. If any component is down, requests must be blocked: never forwarded unscanned.
8. INFRASTRUCTURE HARDENING (P1)
Section titled “8. INFRASTRUCTURE HARDENING (P1)”Current Issues Found
Section titled “Current Issues Found”| Issue | Severity | Fix |
|---|---|---|
.env is 644 (world-readable) | Medium | chmod 600 .env |
OPENAI_API_KEY accessible to all processes | High | Egress firewall or remove from .env |
| No TLS (HTTP only) | High | Add TLS termination (nginx/caddy) |
| No request size limit at reverse proxy level | Medium | Add client_max_body_size in nginx |
| Uvicorn runs as user process, not containerized | Medium | Use Docker with non-root user |
| No health check endpoint monitoring | Low | Add uptime monitoring |
| OPA runs without TLS | Medium | Enable OPA TLS |
| No log rotation | Low | Add logrotate or Docker log driver |
| OpenAPI docs exposed in production | Low | Disable /docs and /redoc |
Docker Hardening
Section titled “Docker Hardening”# Non-root userRUN addgroup -S tappass && adduser -S tappass -G tappassUSER tappass
# Read-only filesystemdocker run --read-only --tmpfs /tmp tappass
# No new privilegesdocker run --security-opt no-new-privileges tappass
# Resource limitsdocker run --memory=2g --cpus=2 tappass9. COMPLIANCE TESTING (P2)
Section titled “9. COMPLIANCE TESTING (P2)”SOC 2 Type II Requirements
Section titled “SOC 2 Type II Requirements”| Control | TapPass Status | Gap |
|---|---|---|
| Access control | ✅ API keys, agent registration | Need RBAC audit trail |
| Encryption in transit | ❌ No TLS | Need TLS |
| Encryption at rest | ⚠️ Memory storage only | Need encrypted persistent store |
| Audit logging | ⚠️ structlog exists | Need tamper-proof audit log |
| Incident response | ❌ No alerting | Need PagerDuty/Slack alerts |
| Change management | ⚠️ Git + CI | Need approval gates |
| Vendor management | ❌ No SLA with OpenAI | Need vendor risk assessment |
| Business continuity | ❌ Single instance | Need HA / failover |
GDPR Requirements
Section titled “GDPR Requirements”| Requirement | Status | Gap |
|---|---|---|
| Data minimization | ✅ PII redacted before LLM | ✓ |
| Right to erasure | ❌ No data deletion API | Need purge endpoint |
| Data processing records | ⚠️ Logs exist | Need structured ROPA |
| DPA with OpenAI | ❌ | Need signed DPA |
| Cross-border transfer | ⚠️ Data goes to OpenAI (US) | Need SCCs or EU hosting |
| Breach notification | ❌ No detection | Need monitoring + alerting |
10. STREAMING RESPONSE SCANNING (P1)
Section titled “10. STREAMING RESPONSE SCANNING (P1)”When stream: true is set, does the output scanner work on streamed chunks?
Most production deployments use streaming. If the output scanner only works on complete responses, PII leaks through in stream mode.
curl -X POST localhost:9620/v1/chat/completions \ -H "Authorization: Bearer $KEY" \ -H "Content-Type: application/json" \ -d '{"model":"gpt-4o-mini","stream":true,"messages":[{"role":"user","content":"Generate a fake SSN"}]}' \ --no-buffer# Watch: does SSN appear in chunks before being redacted?The LLM might output “234” in chunk 1, “-56-” in chunk 2, “7890” in chunk 3. No single chunk contains a full SSN, so per-chunk scanning misses it.
Fix Options
Section titled “Fix Options”- Buffer full response, scan, then stream (adds latency)
- Sliding window scanner on accumulated chunks (complex)
- Disable streaming when output scanning is enabled (simplest)
11. CONCURRENT RACE CONDITIONS (P2)
Section titled “11. CONCURRENT RACE CONDITIONS (P2)”Thread-safety of shared state: rate limit counters, session tracking, scanner caches.
# Send 100 concurrent requests from the same agent# Check: rate limiter counts correctly? Session tracking isolated?ab -n 100 -c 50 -H "Authorization: Bearer $KEY" \ -T application/json -p payload.json \ http://localhost:9620/v1/chat/completions
# Race condition on pipeline context# Can two concurrent requests share a PipelineContext?Specific Risks
Section titled “Specific Risks”- Rate limit counter race (100 requests, only 60 should pass)
- Session tracking cross-contamination
- Scanner cache serving stale results
- Concurrent writes to audit log (interleaved entries)
12. CI SECURITY GATES (P1)
Section titled “12. CI SECURITY GATES (P1)”Add automated security checks to .github/workflows/ci.yml.
Recommended Pipeline
Section titled “Recommended Pipeline”jobs: security: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
# SAST - name: Bandit run: | pip install bandit bandit -r tappass/ -ll -ii -f json -o bandit.json
# SCA - name: pip-audit run: | pip install pip-audit pip-audit -r requirements.txt --strict
# Semgrep - name: Semgrep uses: returntocorp/semgrep-action@v1 with: config: >- p/python p/owasp-top-ten p/security-audit
# Container scanning - name: Trivy uses: aquasecurity/trivy-action@master with: scan-type: fs severity: HIGH,CRITICAL
# Secret scanning - name: Gitleaks uses: gitleaks/gitleaks-action@v2
# DAST (on deployed preview) - name: ZAP API scan uses: zaproxy/action-api-scan@v0.9.0 with: target: http://localhost:9620/openapi.jsonPriority Matrix
Section titled “Priority Matrix” IMPACT High ┌────────────────────────────────┐ │ │ │ P0: Pen test │ │ P0: Load test (PII under load)│ │ P1: Streaming scan │ │ P1: SAST (Bandit/Semgrep) │ │ P1: SCA (pip-audit) │ │ P1: Infra hardening (TLS,.env)│ │ P1: CI security gates │ │ │ Med │ P2: DAST (ZAP) │ │ P2: Fuzzing │ │ P2: Chaos testing │ │ P2: Race conditions │ │ P2: Compliance audit │ │ │ Low │ P3: Supply chain (SBOM) │ │ P3: Red team social eng │ │ │ └────────────────────────────────┘ Easy ──────────── Hard EFFORTEstimated Effort
Section titled “Estimated Effort”| Category | Tool/Method | Time | Cost |
|---|---|---|---|
| Pen test | External firm | 3–5 days | €3,000–8,000 |
| Pen test | Self (OWASP ZAP + manual) | 2–3 days | Free |
| Load test | k6 | 1 day | Free |
| SAST | Bandit + Semgrep | 2 hours setup + fix cycle | Free |
| SCA | pip-audit + Trivy | 30 min setup | Free |
| DAST | ZAP API scan | 2 hours | Free |
| Fuzzing | Schemathesis | 1 day | Free |
| Chaos test | Manual scripts | 1 day | Free |
| Infra hardening | TLS + Docker + .env | 1 day | Free |
| Streaming scan | Code change | 2–3 days | Free |
| CI gates | GitHub Actions | 2 hours | Free |
| Compliance | Audit prep | 2–4 weeks | €5,000–20,000 (auditor) |
Total self-service: ~8–10 days of work
Total with external pen test + compliance: add €8,000–28,000
Quick Wins (Do Today)
Section titled “Quick Wins (Do Today)”# 1. Fix .env permissionschmod 600 .env
# 2. Install and run Banditpip install bandit && bandit -r tappass/ -ll
# 3. Install and run pip-auditpip install pip-audit && pip-audit
# 4. Disable OpenAPI docs in production# Set TAPPASS_DOCS_ENABLED=false
# 5. Test streaming PII leakagecurl -N localhost:9620/v1/chat/completions \ -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \ -d '{"model":"gpt-4o-mini","stream":true,"messages":[{"role":"user","content":"Generate a fake customer SSN"}]}'