WebSocket vs REST
| If your text is… | Use | Why |
|---|---|---|
| Streamed in token-by-token (LLM output, agent loop, live UI) | WebSocket | Lowest first-byte latency. Send text as it arrives, audio chunks come back as soon as they’re generated. |
| A finished block in hand (article, scripted line, batch job) | REST | One HTTP call, no connection to manage. Save the response body to a file and you’re done. |
| Many concurrent synthesis requests | WebSocket + Multiplexing | Run several requests over a single socket, matched by client_req_id. |
| Browser or mobile app audio | Browser WebSockets | Use short-lived tokens instead of exposing an API key. |
What both transports share
- Voices: any
voice_idfrom the Voice Library works on both. - Output formats: PCM, WAV, Opus, and the telephony codecs (
ulaw_8000,alaw_8000). - Voice settings:
temp,cfg_coef,padding_bonus,rewrite_rules,pronunciation_id. See Voice Settings. - Pronunciation dictionaries: pass
pronunciation_idin either transport.
What’s transport-specific
<flush>and<break time="..." />tags are processed by the model in both transports, but only meaningfully useful when you’re streaming text in over the WebSocket.- WebSocket-only: setup-message stream controls (
send_setup_on_start,wait_for_ready_on_start), multiplexing, in-streamflush, browser tokens. See WebSocket Lifecycle. - REST-only:
only_audiotoggle to choose between raw audio bytes and an NDJSON stream that mirrors the WebSocket protocol.
Next steps
Use the WebSocket API
SDK and direct WebSocket usage, streaming output, flush, timestamps.
Use the REST API
One-shot synthesis with a single HTTP POST.
LLM to TTS
Stream generated text into TTS while preserving natural prosody.
Voice settings
Speed, temperature, voice similarity, rewrite rules.
Voice Library
Browse flagship voices or create your own clones.