Migrate from Deepgram

Keep your existing adapter shape: audio in still produces transcript events, and text in still produces audio bytes or chunks. Move the implementation to Gradium endpoints, x-api-key auth, Gradium voice_id values, and Gradium message types. For realtime STT, Gradium adds semantic VAD, adaptive delay, and explicit flush handling for turn-taking.

Endpoint Swap

Flow	Existing integration	Gradium
Pre-recorded STT	`POST https://api.deepgram.com/v1/listen`	`POST https://api.gradium.ai/api/post/speech/asr`
Streaming STT	`wss://api.deepgram.com/v1/listen`	`wss://api.gradium.ai/api/speech/asr`
One-shot TTS	`POST https://api.deepgram.com/v1/speak`	`POST https://api.gradium.ai/api/post/speech/tts`
Streaming TTS	`wss://api.deepgram.com/v1/speak`	`wss://api.gradium.ai/api/speech/tts`

Auth Mapping

Existing auth	Gradium
`Authorization: Token $DEEPGRAM_API_KEY`	`x-api-key: $GRADIUM_API_KEY`
`Authorization: Bearer <token>`	Browser/mobile clients should use Gradium temporary WebSocket tokens. See Browser WebSockets.
API key in WebSocket headers	`x-api-key` header, or `?token=...` for short-lived browser tokens.

Gradium Speech-to-Text POST

For pre-recorded audio, keep sending the audio bytes in the request body and switch the URL. Gradium streams newline-delimited JSON messages back.

Gradium

curl -L -X POST https://api.gradium.ai/api/post/speech/asr \
  -H "x-api-key: $GRADIUM_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @input.wav

Gradium’s response is NDJSON. Build the transcript by collecting text messages and pairing them with end_text when you need segment end timestamps.

Speech-to-Text WebSocket

Gradium’s direct WebSocket protocol sends audio in JSON messages with base64 payloads:

Gradium messages

{"type":"setup","model_name":"default","input_format":"pcm","json_config":{"language":"en"}}
{"type":"audio","audio":"base64_encoded_audio"}
{"type":"end_of_stream"}

Existing STT concept	Gradium STT message
WebSocket connect	Connect to `wss://api.gradium.ai/api/speech/asr`
Model and language settings	`setup.model_name`, `setup.json_config.language`
Audio frame	`{"type":"audio","audio":"base64_encoded_audio"}`
Transcript result	`{"type":"text","text":"...","start_s":...}`
Final segment timing	`{"type":"end_text","stop_s":...}`
Endpointing / turn signal	`step.vad[*].inactivity_prob`
Close/finalize stream	`{"type":"end_of_stream"}`
Force pending output	`{"type":"flush","flush_id":1}`

Set input_format to match the audio you send: pcm, wav, opus, ulaw_8000, alaw_8000, or another supported Gradium format.

STT Feature Mapping

Existing feature	Gradium equivalent	Migration note
`model=nova-*`	`model_name: "default"` unless given another Gradium model alias	Model names are provider-specific.
`language=en-US`	`json_config: {"language": "en"}`	Gradium uses short language codes: `en`, `fr`, `de`, `es`, `pt`.
Interim transcript events	Streaming `text` plus `end_text` segment finalization	Gradium emits text segments and separate end timestamps.
`speech_final` / `endpointing`	Semantic VAD `step` messages	Use inactivity probabilities across horizons; see Turn-Taking.
`utterance_end_ms`	VAD thresholding plus `send_flush()`	Implement turn-end policy in your app.
`encoding`, `sample_rate`	`input_format` such as `pcm_16000`, `ulaw_8000`	Pick the exact Gradium format for your audio bytes.
`punctuate`, `smart_format`	Built into model behavior where available	There is no one-to-one request flag.
`diarize`, channels, alternatives	No direct public Gradium equivalent in these docs	Keep provider-specific code behind an adapter if you rely on these.
Word-level timestamps	Segment timestamps	Gradium currently documents segment-level timestamps.

Gradium Text-to-Speech POST

Gradium uses voice_id in the JSON body and returns raw audio bytes when only_audio is true, so your existing file-write or playback code can usually remain unchanged.

Gradium

curl -L -X POST https://api.gradium.ai/api/post/speech/tts \
  -H "x-api-key: $GRADIUM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from Gradium.",
    "voice_id": "YTpq7expH9539ERJ",
    "output_format": "wav",
    "only_audio": true
  }' \
  > output.wav

Text-to-Speech WebSocket

For streaming TTS, use Gradium setup, text, and end_of_stream messages:

Existing TTS behavior	Gradium TTS message
Send text	`{"type":"text","text":"..."}` after `setup`
Force generation at a boundary	Include `<flush>` in a `text` message at a natural boundary
Finish generation	`{"type":"end_of_stream"}` for graceful completion, or close the socket to interrupt
Binary/audio response frames	`{"type":"audio","audio":"base64..."}`
Metadata response	`ready` message plus `request_id`
Warning response	`error` message for terminal errors; validate chunking and limits in your app

Gradium messages

{"type":"setup","voice_id":"YTpq7expH9539ERJ","model_name":"default","output_format":"pcm"}
{"type":"text","text":"Hello from Gradium. <flush>"}
{"type":"end_of_stream"}

TTS Feature Mapping

Existing feature	Gradium equivalent	Migration note
Speak REST `/v1/speak`	`POST /api/post/speech/tts`	Use `only_audio: true` for raw audio responses.
Speak WebSocket `/v1/speak`	`wss://api.gradium.ai/api/speech/tts`	Send `setup` before text.
`model=aura-*`	`voice_id` plus optional `model_name`	Choose a Gradium voice from the voice library or your custom voice.
Output `encoding` / `sample_rate`	`output_format`	Use `pcm`, `pcm_16000`, `ulaw_8000`, `wav`, `opus`, etc.
`Flush` command	`<flush>` text tag	Flush sparingly, usually after LLM sentence or answer boundaries.
`Close` command	`end_of_stream` or close socket	Use `end_of_stream` for graceful completion.
TTS metadata	`ready.request_id`, `ready.sample_rate`, `ready.frame_size`	Log `request_id` for debugging.

Adapter Checklist

Replace provider URLs with matching Gradium endpoints.
Change auth to x-api-key, or use browser-safe tokens for client apps.
For STT, map query params into setup and json_config.
For STT WebSocket, wrap audio bytes as base64 JSON messages.
For TTS, replace provider voice/model names with a Gradium voice_id.
For TTS WebSocket, send a Gradium setup message before text.
Replace provider finality fields with Gradium end_text, step, and flushed handling.
Keep provider-specific features such as diarization behind adapter capability checks.

Next steps

Gradium STT WebSocket guide

Real-time audio streaming, semantic VAD, and flush.

Turn-taking recipe

Replace endpointing and speech-final logic with Gradium VAD.

Gradium TTS WebSocket guide

Streaming text-to-speech over WebSocket.

WebSocket Lifecycle

Setup, ready, input, flush, end, multiplexing, and errors.

​Endpoint Swap

​Auth Mapping

​Gradium Speech-to-Text POST

​Speech-to-Text WebSocket

​STT Feature Mapping

​Gradium Text-to-Speech POST

​Text-to-Speech WebSocket

​TTS Feature Mapping

​Adapter Checklist

​Next steps