Use this pattern when an LLM produces text incrementally and you want the user to hear audio before the full answer is finished. The core idea:Documentation Index
Fetch the complete documentation index at: https://docs.gradium.ai/llms.txt
Use this file to discover all available pages before exploring further.
- Open one TTS WebSocket before the LLM starts.
- Buffer LLM tokens into word or phrase chunks.
- Send each chunk as a
textmessage. - Add
<flush>when the LLM finishes a sentence or the whole answer. - Read
audiomessages concurrently and stream them to playback.
Streaming LLM Tokens to TTS
Chunking Rules
- Send complete words, phrases, or sentences.
- Keep punctuation attached to the preceding word.
- Avoid sending one token per message unless tokens are already clean word chunks.
- Add
<flush>at natural boundaries, especially when the LLM has completed the response. - Avoid frequent flushes; they reduce the model’s context and can make prosody choppy.
Handling Interruptions
If the user interrupts the agent:- Stop sending new text.
- Stop playback locally.
- Close the current WebSocket.
- Open a fresh WebSocket for the next answer.
client_req_id.
Related
Text-to-Speech WebSocket
Full TTS streaming guide.
WebSocket Lifecycle
Setup, ready, input, flush, end, and errors.