Streaming
TapPass supports both sync and async streaming. The full response is buffered, scanned for PII/secrets, and re-chunked before streaming to the client. PII never leaks through stream chunks.
Sync streaming
Section titled “Sync streaming”for chunk in agent.stream("Write a security policy"): print(chunk, end="", flush=True)print()Async streaming
Section titled “Async streaming”async for chunk in agent.astream("Summarize findings"): print(chunk, end="", flush=True)With flags
Section titled “With flags”for chunk in agent.stream("Draft the report", flags={"pii": "mask"}): print(chunk, end="", flush=True)OpenAI SDK streaming
Section titled “OpenAI SDK streaming”from openai import OpenAI
client = OpenAI(base_url="http://localhost:9620/v1", api_key="tp_...")stream = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": "Hello"}], stream=True,)for chunk in stream: if chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="")How it works
Section titled “How it works”For stream: true requests, TapPass:
- Buffers the full LLM response
- Runs the output scan pipeline (PII detection, secret scanning, taint check)
- Redacts any findings
- Re-chunks and streams to the client
This adds ~100ms latency but guarantees no sensitive data leaks through partial chunks.