TapPass Security Testing Roadmap

Status as of March 2026. Post Round 12

What We’ve Done vs. What’s Left

                          DONE                           │              TODO
                                                         │
  ✅ Functional security (R1–R12, 300+ tests)            │  ❌ Penetration testing (OWASP API)
  ✅ PII detection (24 obfuscation techniques)            │  ❌ Load / stress testing
  ✅ Injection detection (threshold tuned)                │  ❌ SAST (static analysis)
  ✅ Output scanning (normalizer, decoder)               │  ❌ SCA (dependency CVE scan)
  ✅ Role coverage (6 roles)                              │  ❌ DAST (automated vuln scanning)
  ✅ Multimodal content support                           │  ❌ Fuzzing
  ✅ ReDoS resistance (verified <2s)                      │  ❌ Chaos / resilience testing
  ✅ Rate limiting (basic)                                │  ❌ Infrastructure hardening
  ✅ Auth enforcement                                     │  ❌ Compliance audit (SOC2/GDPR)
  ✅ Security headers middleware                          │  ❌ Secrets rotation & management
  ✅ Error sanitization (no stack traces)                 │  ❌ Streaming response scanning
  ✅ Basic edge cases (null, empty, large)                │  ❌ Concurrent race conditions
                                                         │  ❌ CI security gates

1. PENETRATION TEST (P0. Do First)

What

OWASP API Security Top 10 (2023) test against the /v1/chat/completions endpoint and all admin routes.

Why

We’ve tested the governance pipeline extensively but NOT the web application layer. FastAPI routes, auth flows, admin APIs, CORS, SSRF, etc.

Scope

OWASP API #	Risk	TapPass Surface	Status
API1: Broken Object Level Auth	Can agent A read agent B’s config?	`/v1/agents/{id}`, `/v1/pipelines/{id}`	❌ Untested
API2: Broken Authentication	JWT bypass, token reuse, key bruteforce	`/v1/auth/*`, Bearer tokens	⚠️ Basic only
API3: Broken Object Property Auth	Can user modify admin-only fields?	Pipeline config, rate limit overrides	❌ Untested
API4: Unrestricted Resource Consumption	Memory exhaustion, CPU spike	Large payloads, many connections	❌ Untested
API5: Broken Function Level Auth	Admin routes accessible to agents?	`/v1/admin/*`, `/v1/pipelines/create`	❌ Untested
API6: Unrestricted Access to Sensitive Business Flows	Abuse governance bypass	Repeated block attempts, policy enum	❌ Untested
API7: Server Side Request Forgery	Can LLM base_url be SSRF vector?	`base_url` config, webhook URLs	❌ Untested
API8: Security Misconfiguration	Headers, CORS, debug mode, verbose errors	All endpoints	⚠️ Headers exist, CORS unchecked
API9: Improper Inventory Management	Shadow/deprecated endpoints exposed?	All routes	❌ Untested
API10: Unsafe API Consumption	Trusting upstream LLM responses blindly?	OpenAI response parsing	⚠️ Output scanner exists

How

# Option A: Manual with tools
pip install httpie
# Test BOLA
http GET localhost:9620/v1/agents/OTHER_AGENT_ID Authorization:"Bearer agent_a_token"
# Test BFLA
http POST localhost:9620/v1/admin/pipelines Authorization:"Bearer regular_user_token"
# Test SSRF
# Set base_url to http://169.254.169.254/latest/meta-data/ (AWS metadata)

# Option B: Automated
# Use OWASP ZAP or Burp Suite against localhost:9620
docker run -t ghcr.io/zaproxy/zaproxy:stable zap-api-scan.py \
  -t http://host.docker.internal:9620/openapi.json -f openapi

# Option C: Hire external pentester (recommended for enterprise)
# Budget: €3,000–8,000 for a 3–5 day engagement

Specific Tests Needed

JWT token forgery (alg=none, key confusion)
Admin API key enumeration / timing attack
IDOR on agent/pipeline CRUD
SSRF via base_url or webhook URL
Path traversal on any file-serving routes
HTTP verb tampering (GET vs POST vs PUT vs DELETE)
Content-Type confusion (XML, form-data, etc.)
Request smuggling (HTTP/1.1 vs HTTP/2)

2. LOAD & STRESS TESTING (P0: Do First)

What

Verify TapPass doesn’t degrade security under load. Does the PII scanner skip checks when overloaded? Do rate limits hold?

Why

Security under load is different from security at rest. Many systems silently skip expensive checks (Presidio NER, regex normalizer) when response time budgets are exceeded.

Tests

Test	Tool	Target	Pass Criteria
Sustained load	k6 / Locust	100 req/s for 5 min	No 5xx, p99 < 5s, PII still detected
Spike test	k6	0 → 500 req/s instant	Rate limiter engages, no crash
Soak test	k6	50 req/s for 1 hour	No memory leak, stable latency
PII under load	Custom	100 req/s all with SSN	100% detection rate maintained
Large payload flood	k6	49KB messages at 50 req/s	No OOM, all scanned
Concurrent same-agent	k6	200 concurrent from 1 agent	Rate limiter per-agent works
Connection exhaustion	Custom	10,000 half-open connections	Server stays responsive
Slow loris	slowhttptest	Slow headers/body	Uvicorn timeout works

How

# Install k6
brew install k6

# Basic load test
cat > load-test.js << 'EOF'
import http from 'k6/http';
import { check } from 'k6';

export const options = {
  stages: [
    { duration: '30s', target: 50 },
    { duration: '2m', target: 100 },
    { duration: '30s', target: 0 },
  ],
};

const API = 'http://localhost:9620/v1/chat/completions';
const HEADERS = {
  'Content-Type': 'application/json',
  'Authorization': `Bearer ${__ENV.TAPPASS_API_KEY}`,
};

export default function () {
  / Mix of clean + PII requests
  const hasPII = Math.random() > 0.5;
  const body = JSON.stringify({
    model: 'gpt-4o-mini',
    messages: [{ role: 'user', content: hasPII ? 'SSN: 234-56-7890' : 'What is 2+2?' }],
    max_tokens: 20,
  });

  const res = http.post(API, body, { headers: HEADERS, timeout: '30s' });

  check(res, {
    'status is 200': (r) => r.status === 200,
    'PII detected when sent': (r) => {
      if (!hasPII) return true;
      const tappass = JSON.parse(r.body).tappass || {};
      return (tappass.steps || []).some(s => s.step === 'detect_pii' && s.detected);
    },
  });
}
EOF
k6 run -e TAPPASS_API_KEY=$TAPPASS_ADMIN_API_KEY load-test.js

Critical Question

Does PII detection rate drop below 100% under load?
If yes, we need circuit breakers that BLOCK requests when the scanner is overloaded, not skip scanning.

3. STATIC APPLICATION SECURITY TESTING: SAST (P1)

What

Automated code scanning for vulnerabilities in the Python source code.

Why

175 Python files. Manual review caught pipeline bugs, but didn’t systematically check for SQL injection, path traversal, insecure deserialization, hardcoded secrets, etc.

Tools

Tool	What It Finds	Cost
Bandit	Python-specific security issues (eval, exec, hardcoded passwords, weak crypto)	Free
Semgrep	Pattern-based bugs (OWASP rules, custom rules)	Free (OSS) / Paid (Cloud)
CodeQL	Deep dataflow analysis (taint tracking)	Free for OSS on GitHub
Ruff	Fast linter with some security rules	Free

How

# Bandit: Python security linter
pip install bandit
bandit -r tappass/ -ll -ii  # medium+ severity, medium+ confidence

# Semgrep. pattern-based scanner with OWASP rules
pip install semgrep
semgrep --config=auto tappass/
semgrep --config=p/owasp-top-ten tappass/
semgrep --config=p/python tappass/

# Both should run in CI (add to .github/workflows/ci.yml)

Known Issues to Check

eval() / exec() anywhere in codebase
yaml.load() without SafeLoader
pickle.loads() on untrusted data
Hardcoded secrets in source (not just .env)
os.path.join() with user input (path traversal)
subprocess with shell=True
Insecure random (random vs secrets)
Missing httponly/secure on cookies (if any)

4. SOFTWARE COMPOSITION ANALYSIS. SCA (P1)

What

Scan all Python dependencies for known CVEs.

Why

TapPass has ~25 direct dependencies including cryptography, pyjwt, httpx, litellm, spacy. all attack surface. One vulnerable transitive dependency = compromised.

How

# pip-audit (official PyPA tool)
pip install pip-audit
pip-audit -r requirements.txt

# Safety (alternative)
pip install safety
safety check -r requirements.txt

# Trivy (also scans Docker images)
brew install trivy
trivy fs --scanners vuln .
trivy image tappass:latest

# GitHub Dependabot. enable in repo settings (free)
# Snyk. more detailed, integrates with CI (free tier)

What We Already Know

.env is world-readable (644) → should be 600
OPENAI_API_KEY is in .env (known blocker C2)
Dependency versions use >= (floor only, no ceiling) → could pull vulnerable newer versions

5. DYNAMIC APPLICATION SECURITY TESTING. DAST (P2)

What

Black-box automated vulnerability scanning against the running API.

Why

Tests the actual deployed surface. finds misconfigurations, missing headers, unexpected endpoints, parameter pollution.

Tools

# OWASP ZAP (free, excellent for APIs)
docker run -t ghcr.io/zaproxy/zaproxy:stable zap-api-scan.py \
  -t http://host.docker.internal:9620/openapi.json \
  -f openapi -r report.html

# Nuclei (fast, template-based)
brew install nuclei
nuclei -u http://localhost:9620 -t http/ -t exposures/ -t misconfiguration/

# Nikto (web server scanner)
nikto -h http://localhost:9620

Specific Checks

OpenAPI spec exposed? (/docs, /redoc, /openapi.json)
Debug mode enabled? (/debug, /__debug__)
Admin routes without auth?
CORS misconfiguration (wildcard *?)
Missing security headers on all routes
Information disclosure in error responses

6. FUZZING (P2)

What

Send malformed, random, and boundary-case inputs to every API parameter.

Why

Our tests use reasonable inputs. Fuzzers find crashes from inputs no human would think of. malformed JSON, binary data in string fields, extreme unicode, etc.

How

# Schemathesis. OpenAPI-aware fuzzer
pip install schemathesis
schemathesis run http://localhost:9620/openapi.json \
  --base-url http://localhost:9620 \
  --hypothesis-max-examples=1000 \
  -H "Authorization: Bearer $TAPPASS_ADMIN_API_KEY"

# RESTler. Microsoft's stateful REST API fuzzer
# https://github.com/microsoft/restler-fuzzer

# Custom fuzzer for PII patterns
python3 -c "
import random, string
# Generate random strings that ALMOST match SSN/email/CC patterns
# to find regex edge cases
for _ in range(10000):
    # Random digit groups
    a,b,c = random.randint(0,999), random.randint(0,99), random.randint(0,9999)
    sep = random.choice(['-','.','/','|','_',' ','\t'])
    print(f'{a:03d}{sep}{b:02d}{sep}{c:04d}')
"

Targets

messages[].content: arbitrary strings, binary, nulls
messages[].role. unknown roles, empty, very long
model. SQL injection, path traversal, command injection
max_tokens. negative, zero, MAX_INT, float, string
temperature. extreme values
Headers. very long, duplicate, conflicting
Body. malformed JSON, XML, form-data
Encoding. invalid UTF-8, mixed encodings

7. CHAOS & RESILIENCE TESTING (P2)

What

What happens when components fail? Does security degrade gracefully or fail open?

Critical Questions

Scenario	Expected Behavior	Risk if Wrong
OPA goes down	Block all requests (fail closed)	Policies bypassed
Presidio/spaCy crashes	Block requests with PII-like content	PII leaks through
OpenAI API timeout	Return error, don’t leak partial response	Partial data exposure
Redis/store unavailable	Rate limits unenforced?	DoS possible
Disk full	Logs stop, but scanning continues	Audit trail gap
Out of memory	Process killed, restarts clean	Stale state

How

# Kill OPA and send PII request
pkill -f opa
curl -X POST localhost:9620/v1/chat/completions \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"SSN 234-56-7890"}]}'
# MUST return error, not forward unscanned

# Simulate slow LLM (add 30s delay)
# Does the output scanner still run on timeout responses?

# Memory pressure
stress --vm 2 --vm-bytes 4G --timeout 60s &
# Send PII request during memory pressure

The Golden Rule

Every failure mode must FAIL CLOSED. If any component is down, requests must be blocked: never forwarded unscanned.

8. INFRASTRUCTURE HARDENING (P1)

Current Issues Found

Issue	Severity	Fix
`.env` is 644 (world-readable)	Medium	`chmod 600 .env`
`OPENAI_API_KEY` accessible to all processes	High	Egress firewall or remove from `.env`
No TLS (HTTP only)	High	Add TLS termination (nginx/caddy)
No request size limit at reverse proxy level	Medium	Add `client_max_body_size` in nginx
Uvicorn runs as user process, not containerized	Medium	Use Docker with non-root user
No health check endpoint monitoring	Low	Add uptime monitoring
OPA runs without TLS	Medium	Enable OPA TLS
No log rotation	Low	Add logrotate or Docker log driver
OpenAPI docs exposed in production	Low	Disable `/docs` and `/redoc`

Docker Hardening

# Non-root user
RUN addgroup -S tappass && adduser -S tappass -G tappass
USER tappass

# Read-only filesystem
docker run --read-only --tmpfs /tmp tappass

# No new privileges
docker run --security-opt no-new-privileges tappass

# Resource limits
docker run --memory=2g --cpus=2 tappass

9. COMPLIANCE TESTING (P2)

SOC 2 Type II Requirements

Control	TapPass Status	Gap
Access control	✅ API keys, agent registration	Need RBAC audit trail
Encryption in transit	❌ No TLS	Need TLS
Encryption at rest	⚠️ Memory storage only	Need encrypted persistent store
Audit logging	⚠️ structlog exists	Need tamper-proof audit log
Incident response	❌ No alerting	Need PagerDuty/Slack alerts
Change management	⚠️ Git + CI	Need approval gates
Vendor management	❌ No SLA with OpenAI	Need vendor risk assessment
Business continuity	❌ Single instance	Need HA / failover

Requirement	Status	Gap
Data minimization	✅ PII redacted before LLM	✓
Right to erasure	❌ No data deletion API	Need purge endpoint
Data processing records	⚠️ Logs exist	Need structured ROPA
DPA with OpenAI	❌	Need signed DPA
Cross-border transfer	⚠️ Data goes to OpenAI (US)	Need SCCs or EU hosting
Breach notification	❌ No detection	Need monitoring + alerting

10. STREAMING RESPONSE SCANNING (P1)

What

When stream: true is set, does the output scanner work on streamed chunks?

Why

Most production deployments use streaming. If the output scanner only works on complete responses, PII leaks through in stream mode.

Test

curl -X POST localhost:9620/v1/chat/completions \
  -H "Authorization: Bearer $KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","stream":true,"messages":[{"role":"user","content":"Generate a fake SSN"}]}' \
  --no-buffer
# Watch: does SSN appear in chunks before being redacted?

Risk

The LLM might output “234” in chunk 1, “-56-” in chunk 2, “7890” in chunk 3. No single chunk contains a full SSN, so per-chunk scanning misses it.

Fix Options

Buffer full response, scan, then stream (adds latency)
Sliding window scanner on accumulated chunks (complex)
Disable streaming when output scanning is enabled (simplest)

11. CONCURRENT RACE CONDITIONS (P2)

What

Thread-safety of shared state: rate limit counters, session tracking, scanner caches.

Tests

# Send 100 concurrent requests from the same agent
# Check: rate limiter counts correctly? Session tracking isolated?
ab -n 100 -c 50 -H "Authorization: Bearer $KEY" \
   -T application/json -p payload.json \
   http://localhost:9620/v1/chat/completions

# Race condition on pipeline context
# Can two concurrent requests share a PipelineContext?

Specific Risks

Rate limit counter race (100 requests, only 60 should pass)
Session tracking cross-contamination
Scanner cache serving stale results
Concurrent writes to audit log (interleaved entries)

12. CI SECURITY GATES (P1)

What

Add automated security checks to .github/workflows/ci.yml.

Recommended Pipeline

jobs:
  security:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      # SAST
      - name: Bandit
        run: |
          pip install bandit
          bandit -r tappass/ -ll -ii -f json -o bandit.json

      # SCA
      - name: pip-audit
        run: |
          pip install pip-audit
          pip-audit -r requirements.txt --strict

      # Semgrep
      - name: Semgrep
        uses: returntocorp/semgrep-action@v1
        with:
          config: >-
            p/python
            p/owasp-top-ten
            p/security-audit

      # Container scanning
      - name: Trivy
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: fs
          severity: HIGH,CRITICAL

      # Secret scanning
      - name: Gitleaks
        uses: gitleaks/gitleaks-action@v2

      # DAST (on deployed preview)
      - name: ZAP API scan
        uses: zaproxy/action-api-scan@v0.9.0
        with:
          target: http://localhost:9620/openapi.json

Priority Matrix

           IMPACT
    High ┌────────────────────────────────┐
         │                                │
         │  P0: Pen test                  │
         │  P0: Load test (PII under load)│
         │  P1: Streaming scan            │
         │  P1: SAST (Bandit/Semgrep)     │
         │  P1: SCA (pip-audit)           │
         │  P1: Infra hardening (TLS,.env)│
         │  P1: CI security gates         │
         │                                │
    Med  │  P2: DAST (ZAP)                │
         │  P2: Fuzzing                   │
         │  P2: Chaos testing             │
         │  P2: Race conditions           │
         │  P2: Compliance audit          │
         │                                │
    Low  │  P3: Supply chain (SBOM)       │
         │  P3: Red team social eng       │
         │                                │
         └────────────────────────────────┘
              Easy ──────────── Hard
                    EFFORT

Estimated Effort

Category	Tool/Method	Time	Cost
Pen test	External firm	3–5 days	€3,000–8,000
Pen test	Self (OWASP ZAP + manual)	2–3 days	Free
Load test	k6	1 day	Free
SAST	Bandit + Semgrep	2 hours setup + fix cycle	Free
SCA	pip-audit + Trivy	30 min setup	Free
DAST	ZAP API scan	2 hours	Free
Fuzzing	Schemathesis	1 day	Free
Chaos test	Manual scripts	1 day	Free
Infra hardening	TLS + Docker + .env	1 day	Free
Streaming scan	Code change	2–3 days	Free
CI gates	GitHub Actions	2 hours	Free
Compliance	Audit prep	2–4 weeks	€5,000–20,000 (auditor)

Total self-service: ~8–10 days of work
Total with external pen test + compliance: add €8,000–28,000

Quick Wins (Do Today)

# 1. Fix .env permissions
chmod 600 .env

# 2. Install and run Bandit
pip install bandit && bandit -r tappass/ -ll

# 3. Install and run pip-audit
pip install pip-audit && pip-audit

# 4. Disable OpenAPI docs in production
# Set TAPPASS_DOCS_ENABLED=false

# 5. Test streaming PII leakage
curl -N localhost:9620/v1/chat/completions \
  -H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o-mini","stream":true,"messages":[{"role":"user","content":"Generate a fake customer SSN"}]}'

TapPass Security Testing Roadmap

What We’ve Done vs. What’s Left

1. PENETRATION TEST (P0. Do First)

What

Why

Scope

How

Specific Tests Needed

2. LOAD & STRESS TESTING (P0: Do First)

What

Why

Tests

How

Critical Question

3. STATIC APPLICATION SECURITY TESTING: SAST (P1)

What

Why

Tools

How

Known Issues to Check

4. SOFTWARE COMPOSITION ANALYSIS. SCA (P1)

What

Why

How

What We Already Know

5. DYNAMIC APPLICATION SECURITY TESTING. DAST (P2)

What

Why

Tools

Specific Checks

6. FUZZING (P2)

What

Why

How

Targets

7. CHAOS & RESILIENCE TESTING (P2)

What

Critical Questions

How

The Golden Rule

8. INFRASTRUCTURE HARDENING (P1)

Current Issues Found

Docker Hardening

9. COMPLIANCE TESTING (P2)

SOC 2 Type II Requirements

GDPR Requirements

10. STREAMING RESPONSE SCANNING (P1)

What

Why

Test

Risk

Fix Options

11. CONCURRENT RACE CONDITIONS (P2)

What

Tests

Specific Risks

12. CI SECURITY GATES (P1)

What

Recommended Pipeline

Priority Matrix

Estimated Effort

Quick Wins (Do Today)