Skip to content

Streaming

TapPass supports both sync and async streaming. The full response is buffered, scanned for PII/secrets, and re-chunked before streaming to the client. PII never leaks through stream chunks.

for chunk in agent.stream("Write a security policy"):
print(chunk, end="", flush=True)
print()
async for chunk in agent.astream("Summarize findings"):
print(chunk, end="", flush=True)
for chunk in agent.stream("Draft the report", flags={"pii": "mask"}):
print(chunk, end="", flush=True)
from openai import OpenAI
client = OpenAI(base_url="http://localhost:9620/v1", api_key="tp_...")
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")

For stream: true requests, TapPass:

  1. Buffers the full LLM response
  2. Runs the output scan pipeline (PII detection, secret scanning, taint check)
  3. Redacts any findings
  4. Re-chunks and streams to the client

This adds ~100ms latency but guarantees no sensitive data leaks through partial chunks.