Streaming

TapPass supports both sync and async streaming. The full response is buffered, scanned for PII/secrets, and re-chunked before streaming to the client. PII never leaks through stream chunks.

Sync streaming

for chunk in agent.stream("Write a security policy"):
    print(chunk, end="", flush=True)
print()

Async streaming

async for chunk in agent.astream("Summarize findings"):
    print(chunk, end="", flush=True)

With flags

for chunk in agent.stream("Draft the report", flags={"pii": "mask"}):
    print(chunk, end="", flush=True)

OpenAI SDK streaming

from openai import OpenAI

client = OpenAI(base_url="http://localhost:9620/v1", api_key="tp_...")
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

How it works

For stream: true requests, TapPass:

Buffers the full LLM response
Runs the output scan pipeline (PII detection, secret scanning, taint check)
Redacts any findings
Re-chunks and streams to the client

This adds ~100ms latency but guarantees no sensitive data leaks through partial chunks.