x-api-key
auth, Gradium voice_id values, and Gradium WebSocket messages.
For realtime apps, Gradium also gives you STT semantic VAD and adaptive
delay controls in the same API surface.
Endpoint Swap
| Flow | Existing integration | Gradium |
|---|---|---|
| One-shot TTS | POST https://api.cartesia.ai/tts/bytes | POST https://api.gradium.ai/api/post/speech/tts |
| Streaming TTS | wss://api.cartesia.ai/tts/websocket | wss://api.gradium.ai/api/speech/tts |
| Streaming STT | wss://api.cartesia.ai/stt/websocket | wss://api.gradium.ai/api/speech/asr |
Auth Mapping
| Existing auth/version field | Gradium |
|---|---|
Authorization: Bearer $CARTESIA_API_KEY | x-api-key: $GRADIUM_API_KEY |
X-API-Key | x-api-key |
| Provider API version header | Not required by Gradium |
access_token for browser clients | Gradium short-lived ?token=...; see Browser WebSockets. |
Gradium POST TTS
Gradium’s POST endpoint returns audio bytes whenonly_audio is
true, so the rest of your “write this response to a file or player”
code can stay the same.
Gradium
TTS Field Mapping
| Existing field | Gradium field |
|---|---|
transcript | text |
voice.id | voice_id |
model_id | model_name |
output_format.container / encoding / sample rate | output_format string |
language or localized voice choice | Voice language plus optional json_config.language / rewrite rules |
| Pronunciation dictionary | pronunciation_id |
output_format string: wav, pcm, opus,
ulaw_8000, alaw_8000, or explicit PCM rates such as pcm_16000.
WebSocket TTS Migration
Gradium sendssetup first, then one or more text messages, then
end_of_stream. Use client_req_id when you need to correlate
concurrent logical requests on one socket.
Gradium messages
client_req_id and
close_ws_on_eos: false are optional. Use them when you want multiple
utterances on one socket.
| Existing streaming step | Gradium streaming step |
|---|---|
| Connect to the provider WebSocket | Connect to wss://api.gradium.ai/api/speech/tts |
| Start a logical request | Send Gradium setup, optionally with client_req_id |
| Continue with more text | Send more text messages for the same request |
| Finish the request | Send end_of_stream for that client_req_id |
Receive chunk / audio responses | Receive audio messages |
| Receive timestamps | Receive TTS text timestamp messages at segment granularity |
| Cancel context | Stop playback and close the socket, or isolate requests with client_req_id |
TTS Feature Mapping
| Existing feature | Gradium equivalent | Migration note |
|---|---|---|
context_id | client_req_id | Use for correlation and multiplexing, not persistent prosody across unrelated requests. |
continue: true/false | Multiple text messages then end_of_stream | Keep chunks as complete words/phrases; use <flush> at natural boundaries. |
| Concurrent contexts | Multiplexing | Set close_ws_on_eos: false and route by client_req_id. |
transcript | text | Same concept, different field name. |
Structured output_format | output_format string | Choose the Gradium format closest to your downstream player. |
| Word/phoneme timestamps | Segment timestamps | Gradium documents segment-level text timestamps. |
| Access tokens for browser | Gradium GET /api/api-keys/token and ?token=... | See Browser WebSockets. |
Gradium Streaming STT
Gradium STT uses JSON messages for direct WebSocket audio, with base64 payloads. It returnstext, end_text, step, flushed, and
end_of_stream messages.
Gradium STT
| Existing STT concept | Gradium STT equivalent |
|---|---|
| Binary audio frames | Base64 audio JSON messages |
encoding and sample_rate connection params | input_format such as pcm_16000, pcm_24000, ulaw_8000 |
transcript.is_final | text plus end_text segment finalization |
finalize | flush with a flush_id |
flush_done | flushed with the same flush_id |
done | end_of_stream |
| Word timestamps | Segment timestamps in public Gradium docs |
| External VAD / turn events | Semantic VAD step messages |
Adapter Checklist
- Replace provider URLs with matching Gradium endpoints.
- Remove provider API-version headers; Gradium does not require them.
- Change auth to
x-api-keyon servers or temporary?token=...for browser WebSockets. - Rename
transcripttotext. - Flatten
voice.idtovoice_id. - Convert structured
output_formatinto a Gradium format string. - For context-style routing, map the request identifier to
client_req_idwhen you need response correlation. - For STT direct WebSocket clients, base64 encode audio inside JSON messages.
Next steps
Gradium TTS WebSocket guide
Streaming setup messages, audio messages, flush, and timestamps.
Multiplexing
Replace context-style routing with
client_req_id.Browser WebSockets
Use short-lived tokens without exposing API keys.
Telephony audio
Map low-sample-rate PCM, mu-law, and A-law formats.