Skip to main content
Use this pattern when an LLM produces text incrementally and you want the user to hear audio before the full answer is finished. The core idea:
  1. Open one TTS WebSocket before the LLM starts.
  2. Buffer LLM tokens into word or phrase chunks.
  3. Send each chunk as a text message.
  4. Add <flush> when the LLM finishes a sentence or the whole answer.
  5. Read audio messages concurrently and stream them to playback.

Streaming LLM Tokens to TTS

import asyncio
import gradium


async def play_pcm(chunk: bytes):
    # Replace with your audio playback sink.
    ...


async def phrase_chunks(llm_tokens):
    buffer = []
    async for token in llm_tokens:
        buffer.append(token)
        text = "".join(buffer)
        if text.endswith((".", "?", "!", ", ")) or len(text) > 80:
            yield text
            buffer = []
    if buffer:
        yield "".join(buffer) + " <flush>"


async def speak_llm(llm_tokens):
    client = gradium.client.GradiumClient(api_key="your-api-key")

    async with client.tts_realtime(
        voice_id="YTpq7expH9539ERJ",
        output_format="pcm",
    ) as tts:
        async def sender():
            async for chunk in phrase_chunks(llm_tokens):
                await tts.send_text(chunk)
            await tts.send_eos()

        async def receiver():
            async for msg in tts:
                if msg["type"] == "audio":
                    await play_pcm(msg["audio"])
                elif msg["type"] == "end_of_stream":
                    return

        await asyncio.gather(sender(), receiver())

Chunking Rules

  • Send complete words, phrases, or sentences.
  • Keep punctuation attached to the preceding word.
  • Avoid sending one token per message unless tokens are already clean word chunks.
  • Add <flush> at natural boundaries, especially when the LLM has completed the response.
  • Avoid frequent flushes; they reduce the model’s context and can make prosody choppy.

Handling Interruptions

If the user interrupts the agent:
  1. Stop sending new text.
  2. Stop playback locally.
  3. Close the current WebSocket.
  4. Open a fresh WebSocket for the next answer.
For several independent replies on one connection, use Multiplexing and route audio by client_req_id.

Text-to-Speech WebSocket

Full TTS streaming guide.

WebSocket Lifecycle

Setup, ready, input, flush, end, and errors.