Multiplexing - Gradium API

Multiplexing lets you send multiple independent requests over one WebSocket connection. Each logical request gets a client_req_id; the server copies that value onto responses so your client can route audio, text, VAD, and end events back to the right caller. This is useful when you want to avoid opening a new WebSocket for every short utterance or when a server needs to process many low-latency TTS requests in parallel.

How Multiplexing Works

Mode	How to use it	Behavior
Single-use	Send setup without `close_ws_on_eos: false`	The socket closes after `end_of_stream`.
Reusable sequential	Set `close_ws_on_eos: false`, omit `client_req_id`	Keep the socket open and process one request at a time.
Concurrent multiplexed	Set `close_ws_on_eos: false`, include `client_req_id` on every message	Multiple active requests share the socket.

To run requests concurrently on the same socket:

Set close_ws_on_eos: false in every setup.
Generate a unique client_req_id per logical request.
Include that same client_req_id on setup, input messages, and end_of_stream.
Route responses by client_req_id.

If you reuse a client_req_id while the previous request is still active, the server returns a protocol error.

TTS Example

import asyncio
import collections
import gradium


async def synthesize_many():
    setup = {
        "voice_id": "RI2y7oBdsQJmkgFF",
        "output_format": "wav",
        "close_ws_on_eos": False,  # Enable multiplexing.
    }
    texts = [
        "First request. Second part, last one.",
        "Second request. Second part, last one again.",
    ]

    client = gradium.client.GradiumClient(api_key="your-api-key")
    async with client.tts_realtime(send_setup_on_start=False) as stream:

        async def send_loop():
            for idx, text in enumerate(texts):
                stamp = {"client_req_id": f"req-{idx:02d}"}
                await stream.send_setup(setup | stamp)
                await stream.send_text(text, **stamp)
                await stream.send_eos(**stamp)

        async def recv_loop():
            audio = collections.defaultdict(list)
            num_eos = 0
            async for msg in stream:
                if msg["type"] == "audio":
                    audio[msg.get("client_req_id")].append(msg["audio"])
                elif msg["type"] == "end_of_stream":
                    num_eos += 1
                    if num_eos == len(texts):
                        break
            return audio

        _, audio = await asyncio.gather(send_loop(), recv_loop())
        return {k: b"".join(v) for k, v in audio.items()}


audio_by_request = asyncio.run(synthesize_many())

Each audio message contains decoded bytes when using the Python SDK. If you talk to the WebSocket directly, audio is base64 encoded.

Wire Transcript

Client

{"type":"setup","voice_id":"RI2y7oBdsQJmkgFF","output_format":"pcm","close_ws_on_eos":false,"client_req_id":"req-a"}
{"type":"text","text":"First request.","client_req_id":"req-a"}
{"type":"end_of_stream","client_req_id":"req-a"}
{"type":"setup","voice_id":"RI2y7oBdsQJmkgFF","output_format":"pcm","close_ws_on_eos":false,"client_req_id":"req-b"}
{"type":"text","text":"Second request.","client_req_id":"req-b"}
{"type":"end_of_stream","client_req_id":"req-b"}

Server

{"type":"ready","request_id":"...","client_req_id":"req-a","sample_rate":48000,"frame_size":3840}
{"type":"ready","request_id":"...","client_req_id":"req-b","sample_rate":48000,"frame_size":3840}
{"type":"audio","client_req_id":"req-b","audio":"...","start_s":0.0,"stop_s":0.08}
{"type":"audio","client_req_id":"req-a","audio":"...","start_s":0.0,"stop_s":0.08}
{"type":"end_of_stream","client_req_id":"req-b"}
{"type":"end_of_stream","client_req_id":"req-a"}

Responses may arrive interleaved. Do not assume the first request ends before the second one starts returning audio.

STT Notes

The same client_req_id mechanism exists on STT WebSockets. Use it only when each audio source is a separate logical stream and your client can route every audio chunk, flush, and end_of_stream to the right request.

{"type":"setup","model_name":"default","input_format":"pcm","close_ws_on_eos":false,"client_req_id":"caller-1"}
{"type":"audio","audio":"base64_audio","client_req_id":"caller-1"}
{"type":"flush","flush_id":1,"client_req_id":"caller-1"}
{"type":"end_of_stream","client_req_id":"caller-1"}

For most live microphone or telephony applications, one STT WebSocket per live speaker stream is easier to reason about. Multiplex STT only when connection overhead matters and you have strict routing tests.

Closing a Reusable Socket

After all logical requests have finished, send an unscoped end_of_stream to close the reusable socket:

{"type":"end_of_stream"}

If active requests are still running, the server closes after they complete. If no requests are active, it closes immediately.

Error Handling

Errors include client_req_id when the server can identify the logical request:

{"type":"error","message":"Session already active (req-a).","code":1002,"client_req_id":"req-a"}

Treat an error as terminal for that logical request. Depending on the error and whether other sessions are active, the WebSocket may close after outstanding requests finish.

WebSocket Lifecycle

Setup, ready, input, flush, end, and errors.

LLM Tokens to Streaming TTS

Stream generated text while preserving prosody.

​How Multiplexing Works

​TTS Example

​Wire Transcript

​STT Notes

​Closing a Reusable Socket

​Error Handling

​Related