MCPSecBench Compatibility Matrix
TapPass coverage against the MCPSecBench benchmark. 17 MCP attack categories from Yang, Wu & Chen (arXiv:2508.13220, 2025).
MCPSecBench tests 17 attack types across MCP providers (Claude Desktop, OpenAI, Cursor). TapPass sits as a governance proxy between the agent and MCP servers, catching attacks at runtime rather than relying on the LLM to resist them.
Coverage Summary
Section titled “Coverage Summary”| Result | Count | Percentage |
|---|---|---|
| ✅ Detected & Blocked | 14 | 82% |
| 🟡 Detected (logged) | 2 | 12% |
| 🔴 Not applicable | 1 | 6% |
| Total | 17 | 94% effective |
Detailed Matrix
Section titled “Detailed Matrix”| # | Attack Category | TapPass Step(s) | Coverage | Notes |
|---|---|---|---|---|
| 1 | Prompt Injection | detect_injection (5-category: structural, behavioral, obfuscation, payload, indirect) | ✅ Block | 100+ regex patterns, Unicode normalization, base64/ROT13 decode |
| 2 | Tool/Service Misuse via Confused AI | taint_check (session-scoped), scan_tool_calls | ✅ Block | Cross-request taint tracks EXTERNAL data flowing into dangerous sinks (shell, network, email) |
| 3 | Schema Inconsistencies | tool_integrity (registration + session baselines) | ✅ Block | SHA-256 hash of tool definitions. Mid-session changes flagged as rug pull |
| 4 | Slash Command Overlap | detect_tool_poison (shadowing patterns) | ✅ Block | Detects tool descriptions containing instructions to override other tools |
| 5 | Vulnerable Client | scan_tool_calls, forbidden_zones, detect_code_exec | ✅ Block | 70+ protected paths, 28+ dangerous command patterns, path traversal detection |
| 6 | MCP Rebinding | tool_integrity (session-level hash comparison) | 🟡 Detect | Session tool integrity detects definition changes between requests. DNS rebinding itself is network-layer (outside proxy scope) |
| 7 | Man-in-the-Middle | mTLS (SPIFFE/SPIRE), tool_integrity | 🟡 Detect | TapPass uses mTLS for agent ↔ proxy. MCP server connections depend on transport security. Tool hash changes from MITM are detected |
| 8 | Tool Shadowing Attack | detect_tool_poison (_SHADOW_OVERRIDE_PATTERNS) | ✅ Block | Detects “before using any other tool”, “always call me first”, “override”, “instead of” patterns in tool descriptions |
| 9 | Data Exfiltration | detect_exfiltration, taint_check, scan_tool_calls, shell_bleed | ✅ Block | Paste services, DNS tunneling, file piping, covert channels. Taint tracks PII/secrets flowing into network sinks |
| 10 | Package Name Squatting (Tool Name) | detect_tool_poison (shadowing), tool_permissions | ✅ Block | Tool allowlists per agent prevent unregistered tools. Shadow detection catches tools mimicking legitimate ones |
| 11 | Indirect Prompt Injection | scan_tool_results (10 patterns), redact_tool_results | ✅ Block | Scans role=tool messages for injection before LLM processes them. Redacts PII/secrets in tool output |
| 12 | Package Name Squatting (Server Name) | tool_permissions, verify_tool_governance | ✅ Block | Per-agent tool allowlists. Only approved tools are forwarded to LLM |
| 13 | Configuration Drift | tool_integrity (persistent hashes), tappass assess (rug-pull detection) | ✅ Block | Registration baseline + session baseline + cross-run snapshot diffing |
| 14 | Sandbox Escape | sandbox/forbidden_zones.py, sandbox/trust_tiers.py, detect_code_exec | ✅ Block | Kernel-level isolation (nono), 70+ forbidden paths, trust tier enforcement, dangerous command database |
| 15 | Tool Poisoning Attack | detect_tool_poison (injection patterns in descriptions + parameters) | ✅ Block | Scans function.description, parameter.description, and parameter.enum for injection patterns |
| 16 | Vulnerable Server | scan_tool_results, taint_check, tool_integrity | ✅ Block | Malicious server output caught by indirect injection scan + taint tracking. Tool definition changes detected |
| 17 | Rug Pull Attack | _check_session_tool_integrity, tappass assess --discover-tools (history.py) | ✅ Block | Session-level: hash comparison per request. Assessment-level: SHA-256 snapshot diffing between runs |
Attack Categories Not Covered
Section titled “Attack Categories Not Covered”| Category | Why | Mitigation |
|---|---|---|
| DNS Rebinding (partial) | Network-layer attack: TapPass operates at application layer | Detected indirectly via tool definition changes. Recommend network-level DNS pinning |
How TapPass Differs from Tested Providers
Section titled “How TapPass Differs from Tested Providers”MCPSecBench found that Claude Desktop and OpenAI are vulnerable to 15-16 of 17 attacks (only Prompt Injection is partially blocked). TapPass blocks 14 categories because it operates as a proxy: the LLM never sees malicious content.
| Provider | Attacks Blocked | Architecture |
|---|---|---|
| Claude Desktop | 1-2/17 | Relies on LLM judgment |
| OpenAI | 2-3/17 | Relies on LLM judgment |
| Cursor | 1-2/17 | Relies on LLM judgment |
| TapPass | 14/17 | Deterministic proxy: scans before LLM sees content |
Running the Benchmark
Section titled “Running the Benchmark”MCPSecBench requires a running MCP server to test against. To validate TapPass coverage:
# 1. Run TapPass proxytappass serve --mode enforce
# 2. Point the MCPSecBench client at TapPass proxyexport TAPPASS_PROXY=http://localhost:8080uv run client.py 1 # OpenAI mode
# 3. Run the 11 automated attack scenariosuv run main.py 1 0 # mode=OpenAI, protection=none (TapPass handles protection)The proxy intercepts all 11 scenarios, blocking tool poisoning, shadowing, exfiltration, injection, and rug pulls before they reach the LLM.
References
Section titled “References”- Yang, Wu & Chen. “MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols.” arXiv:2508.13220, 2025.
- MCPSecBench GitHub
- TapPass OWASP Mapping