Gradium exposes Text-to-Speech over two transports. They share the same models and voices; you pick the transport that matches your input shape and latency needs.Documentation Index
Fetch the complete documentation index at: https://docs.gradium.ai/llms.txt
Use this file to discover all available pages before exploring further.
WebSocket vs REST
| If your text is… | Use | Why |
|---|---|---|
| Streamed in token-by-token (LLM output, agent loop, live UI) | WebSocket | Lowest first-byte latency. Send text as it arrives, audio chunks come back as soon as they’re generated. |
| A finished block in hand (article, scripted line, batch job) | REST | One HTTP call, no connection to manage. Save the response body to a file and you’re done. |
| Many concurrent synthesis requests | WebSocket + Multiplexing | Run several requests over a single socket, matched by client_req_id. |
What both transports share
- Voices: any
voice_idfrom the Voice Library works on both. - Output formats: PCM, WAV, Opus, and the telephony codecs (
ulaw_8000,alaw_8000). - Voice settings:
temp,cfg_coef,padding_bonus,rewrite_rules,pronunciation_id. See Voice Settings. - Pronunciation dictionaries: pass
pronunciation_idin either transport.
What’s transport-specific
<flush>and<break time="..." />tags are processed by the model in both transports, but only meaningfully useful when you’re streaming text in over the WebSocket.- WebSocket-only: setup-message stream controls (
send_setup_on_start,wait_for_ready_on_start), multiplexing, in-streamflush. See WebSocket Stream Options. - REST-only:
only_audiotoggle to choose between raw audio bytes and an NDJSON stream that mirrors the WebSocket protocol.
Next steps
Use the WebSocket API
SDK and direct WebSocket usage, streaming output, flush, timestamps.
Use the REST API
One-shot synthesis with a single HTTP POST.
Voice settings
Speed, temperature, voice similarity, rewrite rules.
Voice Library
Browse flagship voices or create your own clones.