Gradium’s real-time APIs use the same WebSocket lifecycle for TTS and STT:Documentation Index
Fetch the complete documentation index at: https://docs.gradium.ai/llms.txt
Use this file to discover all available pages before exploring further.
- Connect with authentication.
- Send a
setupmessage. - Wait for, or lazily receive,
ready. - Send input messages (
textfor TTS,audiofor STT). - Optionally flush buffered input.
- Send
end_of_stream. - Read output until the server sends
end_of_streamorerror.
Endpoints
| Product | WebSocket endpoint | Input | Output |
|---|---|---|---|
| TTS | wss://api.gradium.ai/api/speech/tts | text messages | audio, text, ready, end_of_stream, error |
| STT | wss://api.gradium.ai/api/speech/asr | audio, flush messages | text, end_text, step, flushed, ready, end_of_stream, error |
Authentication
Server-side clients should send the API key in thex-api-key header:
?token=...; see
Browser WebSockets.
Setup
The first logical message for every request issetup.
TTS setup
STT setup
| Field | Applies to | Purpose |
|---|---|---|
model_name | TTS, STT | Model alias. Use "default" unless support gives you another value. |
json_config | TTS, STT | Advanced model settings. SDK calls accept a dict; raw WebSocket clients may send an object or JSON string. |
client_req_id | TTS, STT | Correlates messages when running multiple requests on one socket. |
close_ws_on_eos | TTS, STT | Defaults to true. Set false to keep the socket open after a request. |
retry_for_s | TTS, STT | Optional setup retry window for transient worker allocation failures. |
| Field | Purpose |
|---|---|
voice_id | Voice library or custom voice ID. Prefer this for production. |
voice | Voice name fallback, defaulting to "default" when no voice_id is provided. |
output_format | wav, pcm, opus, ulaw_8000, alaw_8000, or explicit PCM rates such as pcm_16000. |
pronunciation_id | Pronunciation dictionary to apply to this request. |
| Field | Purpose |
|---|---|
input_format | pcm, wav, opus, ulaw_8000, alaw_8000, or explicit PCM rates such as pcm_16000. |
Ready
After setup, the server sendsready. You can wait for this before
sending input, or start sending immediately and let the SDK capture it
while receiving.
TTS ready
STT ready
request_id in logs and support tickets. For STT, use
delay_in_frames when tuning turn-taking or forced flush behavior.
Input
TTS accepts text messages:| Format | Sample rate | Samples per 80 ms | Bytes per chunk |
|---|---|---|---|
pcm | 24 kHz | 1920 | 3840 |
pcm_8000 | 8 kHz | 640 | 1280 |
pcm_16000 | 16 kHz | 1280 | 2560 |
pcm_48000 | 48 kHz | 3840 | 7680 |
Flush
TTS supports model-level flushing with the<flush> tag inside text:
flush message:
End
Sendend_of_stream when you are done sending input for a request:
close_ws_on_eos: false in setup and keep sending new setup/input
groups.
Multiplexing
To run multiple logical requests over one socket:- Set
close_ws_on_eos: false. - Attach a unique
client_req_idto every message for a request. - Route every response by its matching
client_req_id.
Errors
WebSocket errors are sent as JSON and then the socket closes:error as terminal for that socket. Open a new connection when
retrying. Common codes:
| Code | Meaning |
|---|---|
1002 | Protocol error, such as sending input before setup or reusing an active client_req_id. |
1008 | Policy violation, such as invalid auth, missing subscription, or invalid request policy. |
1011 | Internal server error or unexpected session failure. |
Next steps
Text-to-Speech WebSocket
Stream text in and receive audio chunks back.
Speech-to-Text WebSocket
Stream audio in and receive text, VAD, and flush events.
Multiplexing
Run several logical requests on one WebSocket.
Browser WebSockets
Use short-lived tokens without exposing API keys.