Skip to main content
If your application already has a provider adapter, keep that adapter shape and move the implementation to Gradium endpoints, x-api-key auth, Gradium voice_id values, and Gradium WebSocket messages. For realtime apps, Gradium also gives you STT semantic VAD and adaptive delay controls in the same API surface.

Endpoint Swap

FlowExisting integrationGradium
One-shot TTSPOST https://api.cartesia.ai/tts/bytesPOST https://api.gradium.ai/api/post/speech/tts
Streaming TTSwss://api.cartesia.ai/tts/websocketwss://api.gradium.ai/api/speech/tts
Streaming STTwss://api.cartesia.ai/stt/websocketwss://api.gradium.ai/api/speech/asr

Auth Mapping

Existing auth/version fieldGradium
Authorization: Bearer $CARTESIA_API_KEYx-api-key: $GRADIUM_API_KEY
X-API-Keyx-api-key
Provider API version headerNot required by Gradium
access_token for browser clientsGradium short-lived ?token=...; see Browser WebSockets.

Gradium POST TTS

Gradium’s POST endpoint returns audio bytes when only_audio is true, so the rest of your “write this response to a file or player” code can stay the same.
Gradium
curl -L -X POST https://api.gradium.ai/api/post/speech/tts \
  -H "x-api-key: $GRADIUM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from Gradium.",
    "voice_id": "YTpq7expH9539ERJ",
    "output_format": "wav",
    "only_audio": true
  }' \
  > output.wav

TTS Field Mapping

Existing fieldGradium field
transcripttext
voice.idvoice_id
model_idmodel_name
output_format.container / encoding / sample rateoutput_format string
language or localized voice choiceVoice language plus optional json_config.language / rewrite rules
Pronunciation dictionarypronunciation_id
Gradium uses a compact output_format string: wav, pcm, opus, ulaw_8000, alaw_8000, or explicit PCM rates such as pcm_16000.

WebSocket TTS Migration

Gradium sends setup first, then one or more text messages, then end_of_stream. Use client_req_id when you need to correlate concurrent logical requests on one socket.
Gradium messages
{"type":"setup","voice_id":"YTpq7expH9539ERJ","model_name":"default","output_format":"pcm","client_req_id":"ctx-1","close_ws_on_eos":false}
{"type":"text","text":"Hello from Gradium.","client_req_id":"ctx-1"}
{"type":"end_of_stream","client_req_id":"ctx-1"}
For a single streaming utterance, client_req_id and close_ws_on_eos: false are optional. Use them when you want multiple utterances on one socket.
Existing streaming stepGradium streaming step
Connect to the provider WebSocketConnect to wss://api.gradium.ai/api/speech/tts
Start a logical requestSend Gradium setup, optionally with client_req_id
Continue with more textSend more text messages for the same request
Finish the requestSend end_of_stream for that client_req_id
Receive chunk / audio responsesReceive audio messages
Receive timestampsReceive TTS text timestamp messages at segment granularity
Cancel contextStop playback and close the socket, or isolate requests with client_req_id

TTS Feature Mapping

Existing featureGradium equivalentMigration note
context_idclient_req_idUse for correlation and multiplexing, not persistent prosody across unrelated requests.
continue: true/falseMultiple text messages then end_of_streamKeep chunks as complete words/phrases; use <flush> at natural boundaries.
Concurrent contextsMultiplexingSet close_ws_on_eos: false and route by client_req_id.
transcripttextSame concept, different field name.
Structured output_formatoutput_format stringChoose the Gradium format closest to your downstream player.
Word/phoneme timestampsSegment timestampsGradium documents segment-level text timestamps.
Access tokens for browserGradium GET /api/api-keys/token and ?token=...See Browser WebSockets.

Gradium Streaming STT

Gradium STT uses JSON messages for direct WebSocket audio, with base64 payloads. It returns text, end_text, step, flushed, and end_of_stream messages.
Gradium STT
{"type":"setup","model_name":"default","input_format":"pcm","json_config":{"language":"en"}}
{"type":"audio","audio":"base64_encoded_audio"}
{"type":"flush","flush_id":1}
{"type":"end_of_stream"}
Existing STT conceptGradium STT equivalent
Binary audio framesBase64 audio JSON messages
encoding and sample_rate connection paramsinput_format such as pcm_16000, pcm_24000, ulaw_8000
transcript.is_finaltext plus end_text segment finalization
finalizeflush with a flush_id
flush_doneflushed with the same flush_id
doneend_of_stream
Word timestampsSegment timestamps in public Gradium docs
External VAD / turn eventsSemantic VAD step messages

Adapter Checklist

  • Replace provider URLs with matching Gradium endpoints.
  • Remove provider API-version headers; Gradium does not require them.
  • Change auth to x-api-key on servers or temporary ?token=... for browser WebSockets.
  • Rename transcript to text.
  • Flatten voice.id to voice_id.
  • Convert structured output_format into a Gradium format string.
  • For context-style routing, map the request identifier to client_req_id when you need response correlation.
  • For STT direct WebSocket clients, base64 encode audio inside JSON messages.

Next steps

Gradium TTS WebSocket guide

Streaming setup messages, audio messages, flush, and timestamps.

Multiplexing

Replace context-style routing with client_req_id.

Browser WebSockets

Use short-lived tokens without exposing API keys.

Telephony audio

Map low-sample-rate PCM, mu-law, and A-law formats.