client_req_id; the
server copies that value onto responses so your client can route audio,
text, VAD, and end events back to the right caller.
This is useful when you want to avoid opening a new WebSocket for every
short utterance or when a server needs to process many low-latency TTS
requests in parallel.
How Multiplexing Works
| Mode | How to use it | Behavior |
|---|---|---|
| Single-use | Send setup without close_ws_on_eos: false | The socket closes after end_of_stream. |
| Reusable sequential | Set close_ws_on_eos: false, omit client_req_id | Keep the socket open and process one request at a time. |
| Concurrent multiplexed | Set close_ws_on_eos: false, include client_req_id on every message | Multiple active requests share the socket. |
- Set
close_ws_on_eos: falsein everysetup. - Generate a unique
client_req_idper logical request. - Include that same
client_req_idonsetup, input messages, andend_of_stream. - Route responses by
client_req_id.
client_req_id while the previous request is still
active, the server returns a protocol error.
TTS Example
audio message contains decoded bytes when using the Python SDK.
If you talk to the WebSocket directly, audio is base64 encoded.
Wire Transcript
Client
Server
STT Notes
The sameclient_req_id mechanism exists on STT WebSockets. Use it
only when each audio source is a separate logical stream and your
client can route every audio chunk, flush, and end_of_stream to the
right request.
Closing a Reusable Socket
After all logical requests have finished, send an unscopedend_of_stream to close the reusable socket:
Error Handling
Errors includeclient_req_id when the server can identify the
logical request:
error as terminal for that logical request. Depending on the
error and whether other sessions are active, the WebSocket may close
after outstanding requests finish.
Related
WebSocket Lifecycle
Setup, ready, input, flush, end, and errors.
LLM Tokens to Streaming TTS
Stream generated text while preserving prosody.