Skip to main content
Multiplexing lets you send multiple independent requests over one WebSocket connection. Each logical request gets a client_req_id; the server copies that value onto responses so your client can route audio, text, VAD, and end events back to the right caller. This is useful when you want to avoid opening a new WebSocket for every short utterance or when a server needs to process many low-latency TTS requests in parallel.

How Multiplexing Works

ModeHow to use itBehavior
Single-useSend setup without close_ws_on_eos: falseThe socket closes after end_of_stream.
Reusable sequentialSet close_ws_on_eos: false, omit client_req_idKeep the socket open and process one request at a time.
Concurrent multiplexedSet close_ws_on_eos: false, include client_req_id on every messageMultiple active requests share the socket.
To run requests concurrently on the same socket:
  1. Set close_ws_on_eos: false in every setup.
  2. Generate a unique client_req_id per logical request.
  3. Include that same client_req_id on setup, input messages, and end_of_stream.
  4. Route responses by client_req_id.
If you reuse a client_req_id while the previous request is still active, the server returns a protocol error.

TTS Example

import asyncio
import collections
import gradium


async def synthesize_many():
    setup = {
        "voice_id": "RI2y7oBdsQJmkgFF",
        "output_format": "wav",
        "close_ws_on_eos": False,  # Enable multiplexing.
    }
    texts = [
        "First request. Second part, last one.",
        "Second request. Second part, last one again.",
    ]

    client = gradium.client.GradiumClient(api_key="your-api-key")
    async with client.tts_realtime(send_setup_on_start=False) as stream:

        async def send_loop():
            for idx, text in enumerate(texts):
                stamp = {"client_req_id": f"req-{idx:02d}"}
                await stream.send_setup(setup | stamp)
                await stream.send_text(text, **stamp)
                await stream.send_eos(**stamp)

        async def recv_loop():
            audio = collections.defaultdict(list)
            num_eos = 0
            async for msg in stream:
                if msg["type"] == "audio":
                    audio[msg.get("client_req_id")].append(msg["audio"])
                elif msg["type"] == "end_of_stream":
                    num_eos += 1
                    if num_eos == len(texts):
                        break
            return audio

        _, audio = await asyncio.gather(send_loop(), recv_loop())
        return {k: b"".join(v) for k, v in audio.items()}


audio_by_request = asyncio.run(synthesize_many())
Each audio message contains decoded bytes when using the Python SDK. If you talk to the WebSocket directly, audio is base64 encoded.

Wire Transcript

Client
{"type":"setup","voice_id":"RI2y7oBdsQJmkgFF","output_format":"pcm","close_ws_on_eos":false,"client_req_id":"req-a"}
{"type":"text","text":"First request.","client_req_id":"req-a"}
{"type":"end_of_stream","client_req_id":"req-a"}
{"type":"setup","voice_id":"RI2y7oBdsQJmkgFF","output_format":"pcm","close_ws_on_eos":false,"client_req_id":"req-b"}
{"type":"text","text":"Second request.","client_req_id":"req-b"}
{"type":"end_of_stream","client_req_id":"req-b"}
Server
{"type":"ready","request_id":"...","client_req_id":"req-a","sample_rate":48000,"frame_size":3840}
{"type":"ready","request_id":"...","client_req_id":"req-b","sample_rate":48000,"frame_size":3840}
{"type":"audio","client_req_id":"req-b","audio":"...","start_s":0.0,"stop_s":0.08}
{"type":"audio","client_req_id":"req-a","audio":"...","start_s":0.0,"stop_s":0.08}
{"type":"end_of_stream","client_req_id":"req-b"}
{"type":"end_of_stream","client_req_id":"req-a"}
Responses may arrive interleaved. Do not assume the first request ends before the second one starts returning audio.

STT Notes

The same client_req_id mechanism exists on STT WebSockets. Use it only when each audio source is a separate logical stream and your client can route every audio chunk, flush, and end_of_stream to the right request.
{"type":"setup","model_name":"default","input_format":"pcm","close_ws_on_eos":false,"client_req_id":"caller-1"}
{"type":"audio","audio":"base64_audio","client_req_id":"caller-1"}
{"type":"flush","flush_id":1,"client_req_id":"caller-1"}
{"type":"end_of_stream","client_req_id":"caller-1"}
For most live microphone or telephony applications, one STT WebSocket per live speaker stream is easier to reason about. Multiplex STT only when connection overhead matters and you have strict routing tests.

Closing a Reusable Socket

After all logical requests have finished, send an unscoped end_of_stream to close the reusable socket:
{"type":"end_of_stream"}
If active requests are still running, the server closes after they complete. If no requests are active, it closes immediately.

Error Handling

Errors include client_req_id when the server can identify the logical request:
{"type":"error","message":"Session already active (req-a).","code":1002,"client_req_id":"req-a"}
Treat an error as terminal for that logical request. Depending on the error and whether other sessions are active, the WebSocket may close after outstanding requests finish.

WebSocket Lifecycle

Setup, ready, input, flush, end, and errors.

LLM Tokens to Streaming TTS

Stream generated text while preserving prosody.