Skip to main content
Most voice API migrations come down to the same small swap: point your existing request code at Gradium, send your Gradium API key in the x-api-key header, and use a Gradium voice_id or model setting. The bigger win is that the same API covers realtime TTS, realtime STT, semantic VAD, adaptive delay, browser-safe WebSocket tokens, and custom voices. Your app can keep the same shape:
  • POST when you already have the full input.
  • WebSocket when you want streaming input or low-latency output.
  • Audio bytes or streamed chunks come back in the same places your current provider integration already handles them.
  • Semantic VAD and adaptive delay give voice agents first-class turn-taking signals instead of forcing you to bolt on endpointing heuristics.
  • Browser clients should use short-lived Gradium tokens instead of embedding API keys. See Browser WebSockets.
If you already wrapped ElevenLabs, Cartesia, or Deepgram behind a small provider adapter, migrating is usually just changing the URL, auth header, and a few field names.

Gradium POST example

For a complete text block, send one HTTP request and write the audio response to a file:
curl -L -X POST https://api.gradium.ai/api/post/speech/tts \
  -H "x-api-key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello from Gradium.", "voice_id": "YTpq7expH9539ERJ", "output_format": "wav", "only_audio": true}' \
  > output.wav
That is the whole path for one-shot TTS: request body in, audio bytes out. For the full schema, see Text-to-Speech REST.

Gradium WebSocket example

For streaming TTS, connect to the Gradium WebSocket, send setup once, then send text:
wscat -c "wss://api.gradium.ai/api/speech/tts" \
  -H "x-api-key: your_api_key"
After the connection opens, send:
{"type":"setup","voice_id":"YTpq7expH9539ERJ","model_name":"default","output_format":"wav"}
{"type":"text","text":"Hello from Gradium."}
{"type":"end_of_stream"}
Gradium streams audio messages back with base64-encoded audio chunks. For the full message contract, see Text-to-Speech WebSocket.

Provider guides

ElevenLabs to Gradium

Move existing TTS calls to Gradium REST and WebSocket endpoints.

Cartesia to Gradium

Move TTS and STT adapters to Gradium request fields and message types.

Deepgram to Gradium

Replace speech adapters with Gradium STT, TTS, semantic VAD, and flush.

What usually changes

AreaChange
Base URLUse https://api.gradium.ai/api for REST and wss://api.gradium.ai/api for WebSockets.
AuthSend x-api-key: your_api_key.
TTS voicePass a Gradium voice_id in the request body or WebSocket setup message.
TTS outputUse output_format, for example wav, pcm, or opus.
Streaming startSend a Gradium setup message first on WebSocket connections.
Streaming endSend {"type":"end_of_stream"} when you are done sending input.
Browser authGenerate a temporary token with GET /api/api-keys/token, then connect with ?token=....
Concurrent WebSocket requestsUse client_req_id and close_ws_on_eos: false.
Turn-takingUse STT step messages, inactivity_prob, delay_in_frames, and flush.

Speech-to-text endpoints

If you are migrating an STT integration, use the same idea with the STT routes:
FlowGradium endpoint
Complete audio filePOST https://api.gradium.ai/api/post/speech/asr
Live audio streamwss://api.gradium.ai/api/speech/asr
See Speech-to-Text REST and Speech-to-Text WebSocket for the message formats.

Production Patterns

WebSocket lifecycle

Setup, ready, input, flush, end-of-stream, multiplexing, and errors.

Browser WebSockets

Issue short-lived tokens for browser and mobile WebSocket clients.