Migrate from Cartesia

If your application already has a provider adapter, keep that adapter shape and move the implementation to Gradium endpoints, x-api-key auth, Gradium voice_id values, and Gradium WebSocket messages. For realtime apps, Gradium also gives you STT semantic VAD and adaptive delay controls in the same API surface.

Endpoint Swap

Flow	Existing integration	Gradium
One-shot TTS	`POST https://api.cartesia.ai/tts/bytes`	`POST https://api.gradium.ai/api/post/speech/tts`
Streaming TTS	`wss://api.cartesia.ai/tts/websocket`	`wss://api.gradium.ai/api/speech/tts`
Streaming STT	`wss://api.cartesia.ai/stt/websocket`	`wss://api.gradium.ai/api/speech/asr`

Auth Mapping

Existing auth/version field	Gradium
`Authorization: Bearer $CARTESIA_API_KEY`	`x-api-key: $GRADIUM_API_KEY`
`X-API-Key`	`x-api-key`
Provider API version header	Not required by Gradium
`access_token` for browser clients	Gradium short-lived `?token=...`; see Browser WebSockets.

Gradium POST TTS

Gradium’s POST endpoint returns audio bytes when only_audio is true, so the rest of your “write this response to a file or player” code can stay the same.

Gradium

curl -L -X POST https://api.gradium.ai/api/post/speech/tts \
  -H "x-api-key: $GRADIUM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from Gradium.",
    "voice_id": "YTpq7expH9539ERJ",
    "output_format": "wav",
    "only_audio": true
  }' \
  > output.wav

TTS Field Mapping

Existing field	Gradium field
`transcript`	`text`
`voice.id`	`voice_id`
`model_id`	`model_name`
`output_format.container` / encoding / sample rate	`output_format` string
`language` or localized voice choice	Voice language plus optional `json_config.language` / rewrite rules
Pronunciation dictionary	`pronunciation_id`

Gradium uses a compact output_format string: wav, pcm, opus, ulaw_8000, alaw_8000, or explicit PCM rates such as pcm_16000.

WebSocket TTS Migration

Gradium sends setup first, then one or more text messages, then end_of_stream. Use client_req_id when you need to correlate concurrent logical requests on one socket.

Gradium messages

{"type":"setup","voice_id":"YTpq7expH9539ERJ","model_name":"default","output_format":"pcm","client_req_id":"ctx-1","close_ws_on_eos":false}
{"type":"text","text":"Hello from Gradium.","client_req_id":"ctx-1"}
{"type":"end_of_stream","client_req_id":"ctx-1"}

For a single streaming utterance, client_req_id and close_ws_on_eos: false are optional. Use them when you want multiple utterances on one socket.

Existing streaming step	Gradium streaming step
Connect to the provider WebSocket	Connect to `wss://api.gradium.ai/api/speech/tts`
Start a logical request	Send Gradium `setup`, optionally with `client_req_id`
Continue with more text	Send more `text` messages for the same request
Finish the request	Send `end_of_stream` for that `client_req_id`
Receive `chunk` / audio responses	Receive `audio` messages
Receive timestamps	Receive TTS `text` timestamp messages at segment granularity
Cancel context	Stop playback and close the socket, or isolate requests with `client_req_id`

TTS Feature Mapping

Existing feature	Gradium equivalent	Migration note
`context_id`	`client_req_id`	Use for correlation and multiplexing, not persistent prosody across unrelated requests.
`continue: true/false`	Multiple `text` messages then `end_of_stream`	Keep chunks as complete words/phrases; use `<flush>` at natural boundaries.
Concurrent contexts	Multiplexing	Set `close_ws_on_eos: false` and route by `client_req_id`.
`transcript`	`text`	Same concept, different field name.
Structured `output_format`	`output_format` string	Choose the Gradium format closest to your downstream player.
Word/phoneme timestamps	Segment timestamps	Gradium documents segment-level text timestamps.
Access tokens for browser	Gradium `GET /api/api-keys/token` and `?token=...`	See Browser WebSockets.

Gradium Streaming STT

Gradium STT uses JSON messages for direct WebSocket audio, with base64 payloads. It returns text, end_text, step, flushed, and end_of_stream messages.

Gradium STT

{"type":"setup","model_name":"default","input_format":"pcm","json_config":{"language":"en"}}
{"type":"audio","audio":"base64_encoded_audio"}
{"type":"flush","flush_id":1}
{"type":"end_of_stream"}

Existing STT concept	Gradium STT equivalent
Binary audio frames	Base64 `audio` JSON messages
`encoding` and `sample_rate` connection params	`input_format` such as `pcm_16000`, `pcm_24000`, `ulaw_8000`
`transcript.is_final`	`text` plus `end_text` segment finalization
`finalize`	`flush` with a `flush_id`
`flush_done`	`flushed` with the same `flush_id`
`done`	`end_of_stream`
Word timestamps	Segment timestamps in public Gradium docs
External VAD / turn events	Semantic VAD `step` messages

Adapter Checklist

Replace provider URLs with matching Gradium endpoints.
Remove provider API-version headers; Gradium does not require them.
Change auth to x-api-key on servers or temporary ?token=... for browser WebSockets.
Rename transcript to text.
Flatten voice.id to voice_id.
Convert structured output_format into a Gradium format string.
For context-style routing, map the request identifier to client_req_id when you need response correlation.
For STT direct WebSocket clients, base64 encode audio inside JSON messages.

Next steps

Gradium TTS WebSocket guide

Streaming setup messages, audio messages, flush, and timestamps.

Multiplexing

Replace context-style routing with client_req_id.

Browser WebSockets

Use short-lived tokens without exposing API keys.

Telephony audio

Map low-sample-rate PCM, mu-law, and A-law formats.

​Endpoint Swap

​Auth Mapping

​Gradium POST TTS

​TTS Field Mapping

​WebSocket TTS Migration

​TTS Feature Mapping

​Gradium Streaming STT

​Adapter Checklist

​Next steps