Skip to content

TapPass Security Testing Roadmap

Status as of March 2026. Post Round 12


DONE │ TODO
✅ Functional security (R1–R12, 300+ tests) │ ❌ Penetration testing (OWASP API)
✅ PII detection (24 obfuscation techniques) │ ❌ Load / stress testing
✅ Injection detection (threshold tuned) │ ❌ SAST (static analysis)
✅ Output scanning (normalizer, decoder) │ ❌ SCA (dependency CVE scan)
✅ Role coverage (6 roles) │ ❌ DAST (automated vuln scanning)
✅ Multimodal content support │ ❌ Fuzzing
✅ ReDoS resistance (verified <2s) │ ❌ Chaos / resilience testing
✅ Rate limiting (basic) │ ❌ Infrastructure hardening
✅ Auth enforcement │ ❌ Compliance audit (SOC2/GDPR)
✅ Security headers middleware │ ❌ Secrets rotation & management
✅ Error sanitization (no stack traces) │ ❌ Streaming response scanning
✅ Basic edge cases (null, empty, large) │ ❌ Concurrent race conditions
│ ❌ CI security gates

OWASP API Security Top 10 (2023) test against the /v1/chat/completions endpoint and all admin routes.

We’ve tested the governance pipeline extensively but NOT the web application layer. FastAPI routes, auth flows, admin APIs, CORS, SSRF, etc.

OWASP API #RiskTapPass SurfaceStatus
API1: Broken Object Level AuthCan agent A read agent B’s config?/v1/agents/{id}, /v1/pipelines/{id}❌ Untested
API2: Broken AuthenticationJWT bypass, token reuse, key bruteforce/v1/auth/*, Bearer tokens⚠️ Basic only
API3: Broken Object Property AuthCan user modify admin-only fields?Pipeline config, rate limit overrides❌ Untested
API4: Unrestricted Resource ConsumptionMemory exhaustion, CPU spikeLarge payloads, many connections❌ Untested
API5: Broken Function Level AuthAdmin routes accessible to agents?/v1/admin/*, /v1/pipelines/create❌ Untested
API6: Unrestricted Access to Sensitive Business FlowsAbuse governance bypassRepeated block attempts, policy enum❌ Untested
API7: Server Side Request ForgeryCan LLM base_url be SSRF vector?base_url config, webhook URLs❌ Untested
API8: Security MisconfigurationHeaders, CORS, debug mode, verbose errorsAll endpoints⚠️ Headers exist, CORS unchecked
API9: Improper Inventory ManagementShadow/deprecated endpoints exposed?All routes❌ Untested
API10: Unsafe API ConsumptionTrusting upstream LLM responses blindly?OpenAI response parsing⚠️ Output scanner exists
Terminal window
# Option A: Manual with tools
pip install httpie
# Test BOLA
http GET localhost:9620/v1/agents/OTHER_AGENT_ID Authorization:"Bearer agent_a_token"
# Test BFLA
http POST localhost:9620/v1/admin/pipelines Authorization:"Bearer regular_user_token"
# Test SSRF
# Set base_url to http://169.254.169.254/latest/meta-data/ (AWS metadata)
# Option B: Automated
# Use OWASP ZAP or Burp Suite against localhost:9620
docker run -t ghcr.io/zaproxy/zaproxy:stable zap-api-scan.py \
-t http://host.docker.internal:9620/openapi.json -f openapi
# Option C: Hire external pentester (recommended for enterprise)
# Budget: €3,000–8,000 for a 3–5 day engagement
  • JWT token forgery (alg=none, key confusion)
  • Admin API key enumeration / timing attack
  • IDOR on agent/pipeline CRUD
  • SSRF via base_url or webhook URL
  • Path traversal on any file-serving routes
  • HTTP verb tampering (GET vs POST vs PUT vs DELETE)
  • Content-Type confusion (XML, form-data, etc.)
  • Request smuggling (HTTP/1.1 vs HTTP/2)

Verify TapPass doesn’t degrade security under load. Does the PII scanner skip checks when overloaded? Do rate limits hold?

Security under load is different from security at rest. Many systems silently skip expensive checks (Presidio NER, regex normalizer) when response time budgets are exceeded.

TestToolTargetPass Criteria
Sustained loadk6 / Locust100 req/s for 5 minNo 5xx, p99 < 5s, PII still detected
Spike testk60 → 500 req/s instantRate limiter engages, no crash
Soak testk650 req/s for 1 hourNo memory leak, stable latency
PII under loadCustom100 req/s all with SSN100% detection rate maintained
Large payload floodk649KB messages at 50 req/sNo OOM, all scanned
Concurrent same-agentk6200 concurrent from 1 agentRate limiter per-agent works
Connection exhaustionCustom10,000 half-open connectionsServer stays responsive
Slow lorisslowhttptestSlow headers/bodyUvicorn timeout works
Terminal window
# Install k6
brew install k6
# Basic load test
cat > load-test.js << 'EOF'
import http from 'k6/http';
import { check } from 'k6';
export const options = {
stages: [
{ duration: '30s', target: 50 },
{ duration: '2m', target: 100 },
{ duration: '30s', target: 0 },
],
};
const API = 'http://localhost:9620/v1/chat/completions';
const HEADERS = {
'Content-Type': 'application/json',
'Authorization': `Bearer ${__ENV.TAPPASS_API_KEY}`,
};
export default function () {
/ Mix of clean + PII requests
const hasPII = Math.random() > 0.5;
const body = JSON.stringify({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: hasPII ? 'SSN: 234-56-7890' : 'What is 2+2?' }],
max_tokens: 20,
});
const res = http.post(API, body, { headers: HEADERS, timeout: '30s' });
check(res, {
'status is 200': (r) => r.status === 200,
'PII detected when sent': (r) => {
if (!hasPII) return true;
const tappass = JSON.parse(r.body).tappass || {};
return (tappass.steps || []).some(s => s.step === 'detect_pii' && s.detected);
},
});
}
EOF
k6 run -e TAPPASS_API_KEY=$TAPPASS_ADMIN_API_KEY load-test.js

Does PII detection rate drop below 100% under load?
If yes, we need circuit breakers that BLOCK requests when the scanner is overloaded, not skip scanning.


3. STATIC APPLICATION SECURITY TESTING: SAST (P1)

Section titled “3. STATIC APPLICATION SECURITY TESTING: SAST (P1)”

Automated code scanning for vulnerabilities in the Python source code.

175 Python files. Manual review caught pipeline bugs, but didn’t systematically check for SQL injection, path traversal, insecure deserialization, hardcoded secrets, etc.

ToolWhat It FindsCost
BanditPython-specific security issues (eval, exec, hardcoded passwords, weak crypto)Free
SemgrepPattern-based bugs (OWASP rules, custom rules)Free (OSS) / Paid (Cloud)
CodeQLDeep dataflow analysis (taint tracking)Free for OSS on GitHub
RuffFast linter with some security rulesFree
Terminal window
# Bandit: Python security linter
pip install bandit
bandit -r tappass/ -ll -ii # medium+ severity, medium+ confidence
# Semgrep. pattern-based scanner with OWASP rules
pip install semgrep
semgrep --config=auto tappass/
semgrep --config=p/owasp-top-ten tappass/
semgrep --config=p/python tappass/
# Both should run in CI (add to .github/workflows/ci.yml)
  • eval() / exec() anywhere in codebase
  • yaml.load() without SafeLoader
  • pickle.loads() on untrusted data
  • Hardcoded secrets in source (not just .env)
  • os.path.join() with user input (path traversal)
  • subprocess with shell=True
  • Insecure random (random vs secrets)
  • Missing httponly/secure on cookies (if any)

4. SOFTWARE COMPOSITION ANALYSIS. SCA (P1)

Section titled “4. SOFTWARE COMPOSITION ANALYSIS. SCA (P1)”

Scan all Python dependencies for known CVEs.

TapPass has ~25 direct dependencies including cryptography, pyjwt, httpx, litellm, spacy. all attack surface. One vulnerable transitive dependency = compromised.

Terminal window
# pip-audit (official PyPA tool)
pip install pip-audit
pip-audit -r requirements.txt
# Safety (alternative)
pip install safety
safety check -r requirements.txt
# Trivy (also scans Docker images)
brew install trivy
trivy fs --scanners vuln .
trivy image tappass:latest
# GitHub Dependabot. enable in repo settings (free)
# Snyk. more detailed, integrates with CI (free tier)
  • .env is world-readable (644) → should be 600
  • OPENAI_API_KEY is in .env (known blocker C2)
  • Dependency versions use >= (floor only, no ceiling) → could pull vulnerable newer versions

5. DYNAMIC APPLICATION SECURITY TESTING. DAST (P2)

Section titled “5. DYNAMIC APPLICATION SECURITY TESTING. DAST (P2)”

Black-box automated vulnerability scanning against the running API.

Tests the actual deployed surface. finds misconfigurations, missing headers, unexpected endpoints, parameter pollution.

Terminal window
# OWASP ZAP (free, excellent for APIs)
docker run -t ghcr.io/zaproxy/zaproxy:stable zap-api-scan.py \
-t http://host.docker.internal:9620/openapi.json \
-f openapi -r report.html
# Nuclei (fast, template-based)
brew install nuclei
nuclei -u http://localhost:9620 -t http/ -t exposures/ -t misconfiguration/
# Nikto (web server scanner)
nikto -h http://localhost:9620
  • OpenAPI spec exposed? (/docs, /redoc, /openapi.json)
  • Debug mode enabled? (/debug, /__debug__)
  • Admin routes without auth?
  • CORS misconfiguration (wildcard *?)
  • Missing security headers on all routes
  • Information disclosure in error responses

Send malformed, random, and boundary-case inputs to every API parameter.

Our tests use reasonable inputs. Fuzzers find crashes from inputs no human would think of. malformed JSON, binary data in string fields, extreme unicode, etc.

Terminal window
# Schemathesis. OpenAPI-aware fuzzer
pip install schemathesis
schemathesis run http://localhost:9620/openapi.json \
--base-url http://localhost:9620 \
--hypothesis-max-examples=1000 \
-H "Authorization: Bearer $TAPPASS_ADMIN_API_KEY"
# RESTler. Microsoft's stateful REST API fuzzer
# https://github.com/microsoft/restler-fuzzer
# Custom fuzzer for PII patterns
python3 -c "
import random, string
# Generate random strings that ALMOST match SSN/email/CC patterns
# to find regex edge cases
for _ in range(10000):
# Random digit groups
a,b,c = random.randint(0,999), random.randint(0,99), random.randint(0,9999)
sep = random.choice(['-','.','/','|','_',' ','\t'])
print(f'{a:03d}{sep}{b:02d}{sep}{c:04d}')
"
  • messages[].content: arbitrary strings, binary, nulls
  • messages[].role. unknown roles, empty, very long
  • model. SQL injection, path traversal, command injection
  • max_tokens. negative, zero, MAX_INT, float, string
  • temperature. extreme values
  • Headers. very long, duplicate, conflicting
  • Body. malformed JSON, XML, form-data
  • Encoding. invalid UTF-8, mixed encodings

What happens when components fail? Does security degrade gracefully or fail open?

ScenarioExpected BehaviorRisk if Wrong
OPA goes downBlock all requests (fail closed)Policies bypassed
Presidio/spaCy crashesBlock requests with PII-like contentPII leaks through
OpenAI API timeoutReturn error, don’t leak partial responsePartial data exposure
Redis/store unavailableRate limits unenforced?DoS possible
Disk fullLogs stop, but scanning continuesAudit trail gap
Out of memoryProcess killed, restarts cleanStale state
Terminal window
# Kill OPA and send PII request
pkill -f opa
curl -X POST localhost:9620/v1/chat/completions \
-H "Authorization: Bearer $KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"SSN 234-56-7890"}]}'
# MUST return error, not forward unscanned
# Simulate slow LLM (add 30s delay)
# Does the output scanner still run on timeout responses?
# Memory pressure
stress --vm 2 --vm-bytes 4G --timeout 60s &
# Send PII request during memory pressure

Every failure mode must FAIL CLOSED. If any component is down, requests must be blocked: never forwarded unscanned.


IssueSeverityFix
.env is 644 (world-readable)Mediumchmod 600 .env
OPENAI_API_KEY accessible to all processesHighEgress firewall or remove from .env
No TLS (HTTP only)HighAdd TLS termination (nginx/caddy)
No request size limit at reverse proxy levelMediumAdd client_max_body_size in nginx
Uvicorn runs as user process, not containerizedMediumUse Docker with non-root user
No health check endpoint monitoringLowAdd uptime monitoring
OPA runs without TLSMediumEnable OPA TLS
No log rotationLowAdd logrotate or Docker log driver
OpenAPI docs exposed in productionLowDisable /docs and /redoc
# Non-root user
RUN addgroup -S tappass && adduser -S tappass -G tappass
USER tappass
# Read-only filesystem
docker run --read-only --tmpfs /tmp tappass
# No new privileges
docker run --security-opt no-new-privileges tappass
# Resource limits
docker run --memory=2g --cpus=2 tappass

ControlTapPass StatusGap
Access control✅ API keys, agent registrationNeed RBAC audit trail
Encryption in transit❌ No TLSNeed TLS
Encryption at rest⚠️ Memory storage onlyNeed encrypted persistent store
Audit logging⚠️ structlog existsNeed tamper-proof audit log
Incident response❌ No alertingNeed PagerDuty/Slack alerts
Change management⚠️ Git + CINeed approval gates
Vendor management❌ No SLA with OpenAINeed vendor risk assessment
Business continuity❌ Single instanceNeed HA / failover
RequirementStatusGap
Data minimization✅ PII redacted before LLM
Right to erasure❌ No data deletion APINeed purge endpoint
Data processing records⚠️ Logs existNeed structured ROPA
DPA with OpenAINeed signed DPA
Cross-border transfer⚠️ Data goes to OpenAI (US)Need SCCs or EU hosting
Breach notification❌ No detectionNeed monitoring + alerting

When stream: true is set, does the output scanner work on streamed chunks?

Most production deployments use streaming. If the output scanner only works on complete responses, PII leaks through in stream mode.

Terminal window
curl -X POST localhost:9620/v1/chat/completions \
-H "Authorization: Bearer $KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","stream":true,"messages":[{"role":"user","content":"Generate a fake SSN"}]}' \
--no-buffer
# Watch: does SSN appear in chunks before being redacted?

The LLM might output “234” in chunk 1, “-56-” in chunk 2, “7890” in chunk 3. No single chunk contains a full SSN, so per-chunk scanning misses it.

  1. Buffer full response, scan, then stream (adds latency)
  2. Sliding window scanner on accumulated chunks (complex)
  3. Disable streaming when output scanning is enabled (simplest)

Thread-safety of shared state: rate limit counters, session tracking, scanner caches.

Terminal window
# Send 100 concurrent requests from the same agent
# Check: rate limiter counts correctly? Session tracking isolated?
ab -n 100 -c 50 -H "Authorization: Bearer $KEY" \
-T application/json -p payload.json \
http://localhost:9620/v1/chat/completions
# Race condition on pipeline context
# Can two concurrent requests share a PipelineContext?
  • Rate limit counter race (100 requests, only 60 should pass)
  • Session tracking cross-contamination
  • Scanner cache serving stale results
  • Concurrent writes to audit log (interleaved entries)

Add automated security checks to .github/workflows/ci.yml.

jobs:
security:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
# SAST
- name: Bandit
run: |
pip install bandit
bandit -r tappass/ -ll -ii -f json -o bandit.json
# SCA
- name: pip-audit
run: |
pip install pip-audit
pip-audit -r requirements.txt --strict
# Semgrep
- name: Semgrep
uses: returntocorp/semgrep-action@v1
with:
config: >-
p/python
p/owasp-top-ten
p/security-audit
# Container scanning
- name: Trivy
uses: aquasecurity/trivy-action@master
with:
scan-type: fs
severity: HIGH,CRITICAL
# Secret scanning
- name: Gitleaks
uses: gitleaks/gitleaks-action@v2
# DAST (on deployed preview)
- name: ZAP API scan
uses: zaproxy/action-api-scan@v0.9.0
with:
target: http://localhost:9620/openapi.json

IMPACT
High ┌────────────────────────────────┐
│ │
│ P0: Pen test │
│ P0: Load test (PII under load)│
│ P1: Streaming scan │
│ P1: SAST (Bandit/Semgrep) │
│ P1: SCA (pip-audit) │
│ P1: Infra hardening (TLS,.env)│
│ P1: CI security gates │
│ │
Med │ P2: DAST (ZAP) │
│ P2: Fuzzing │
│ P2: Chaos testing │
│ P2: Race conditions │
│ P2: Compliance audit │
│ │
Low │ P3: Supply chain (SBOM) │
│ P3: Red team social eng │
│ │
└────────────────────────────────┘
Easy ──────────── Hard
EFFORT

CategoryTool/MethodTimeCost
Pen testExternal firm3–5 days€3,000–8,000
Pen testSelf (OWASP ZAP + manual)2–3 daysFree
Load testk61 dayFree
SASTBandit + Semgrep2 hours setup + fix cycleFree
SCApip-audit + Trivy30 min setupFree
DASTZAP API scan2 hoursFree
FuzzingSchemathesis1 dayFree
Chaos testManual scripts1 dayFree
Infra hardeningTLS + Docker + .env1 dayFree
Streaming scanCode change2–3 daysFree
CI gatesGitHub Actions2 hoursFree
ComplianceAudit prep2–4 weeks€5,000–20,000 (auditor)

Total self-service: ~8–10 days of work
Total with external pen test + compliance: add €8,000–28,000


Terminal window
# 1. Fix .env permissions
chmod 600 .env
# 2. Install and run Bandit
pip install bandit && bandit -r tappass/ -ll
# 3. Install and run pip-audit
pip install pip-audit && pip-audit
# 4. Disable OpenAPI docs in production
# Set TAPPASS_DOCS_ENABLED=false
# 5. Test streaming PII leakage
curl -N localhost:9620/v1/chat/completions \
-H "Authorization: Bearer $KEY" -H "Content-Type: application/json" \
-d '{"model":"gpt-4o-mini","stream":true,"messages":[{"role":"user","content":"Generate a fake customer SSN"}]}'