x-api-key auth, Gradium
voice_id values, and Gradium message types. For realtime STT,
Gradium adds semantic VAD, adaptive delay, and explicit flush handling
for turn-taking.
Endpoint Swap
| Flow | Existing integration | Gradium |
|---|---|---|
| Pre-recorded STT | POST https://api.deepgram.com/v1/listen | POST https://api.gradium.ai/api/post/speech/asr |
| Streaming STT | wss://api.deepgram.com/v1/listen | wss://api.gradium.ai/api/speech/asr |
| One-shot TTS | POST https://api.deepgram.com/v1/speak | POST https://api.gradium.ai/api/post/speech/tts |
| Streaming TTS | wss://api.deepgram.com/v1/speak | wss://api.gradium.ai/api/speech/tts |
Auth Mapping
| Existing auth | Gradium |
|---|---|
Authorization: Token $DEEPGRAM_API_KEY | x-api-key: $GRADIUM_API_KEY |
Authorization: Bearer <token> | Browser/mobile clients should use Gradium temporary WebSocket tokens. See Browser WebSockets. |
| API key in WebSocket headers | x-api-key header, or ?token=... for short-lived browser tokens. |
Gradium Speech-to-Text POST
For pre-recorded audio, keep sending the audio bytes in the request body and switch the URL. Gradium streams newline-delimited JSON messages back.Gradium
text messages and pairing them with end_text when you need segment
end timestamps.
Speech-to-Text WebSocket
Gradium’s direct WebSocket protocol sends audio in JSON messages with base64 payloads:Gradium messages
| Existing STT concept | Gradium STT message |
|---|---|
| WebSocket connect | Connect to wss://api.gradium.ai/api/speech/asr |
| Model and language settings | setup.model_name, setup.json_config.language |
| Audio frame | {"type":"audio","audio":"base64_encoded_audio"} |
| Transcript result | {"type":"text","text":"...","start_s":...} |
| Final segment timing | {"type":"end_text","stop_s":...} |
| Endpointing / turn signal | step.vad[*].inactivity_prob |
| Close/finalize stream | {"type":"end_of_stream"} |
| Force pending output | {"type":"flush","flush_id":1} |
input_format to match the audio you send: pcm, wav, opus,
ulaw_8000, alaw_8000, or another supported Gradium format.
STT Feature Mapping
| Existing feature | Gradium equivalent | Migration note |
|---|---|---|
model=nova-* | model_name: "default" unless given another Gradium model alias | Model names are provider-specific. |
language=en-US | json_config: {"language": "en"} | Gradium uses short language codes: en, fr, de, es, pt. |
| Interim transcript events | Streaming text plus end_text segment finalization | Gradium emits text segments and separate end timestamps. |
speech_final / endpointing | Semantic VAD step messages | Use inactivity probabilities across horizons; see Turn-Taking. |
utterance_end_ms | VAD thresholding plus send_flush() | Implement turn-end policy in your app. |
encoding, sample_rate | input_format such as pcm_16000, ulaw_8000 | Pick the exact Gradium format for your audio bytes. |
punctuate, smart_format | Built into model behavior where available | There is no one-to-one request flag. |
diarize, channels, alternatives | No direct public Gradium equivalent in these docs | Keep provider-specific code behind an adapter if you rely on these. |
| Word-level timestamps | Segment timestamps | Gradium currently documents segment-level timestamps. |
Gradium Text-to-Speech POST
Gradium usesvoice_id in the JSON body and returns raw audio bytes
when only_audio is true, so your existing file-write or playback
code can usually remain unchanged.
Gradium
Text-to-Speech WebSocket
For streaming TTS, use Gradiumsetup, text, and end_of_stream
messages:
| Existing TTS behavior | Gradium TTS message |
|---|---|
| Send text | {"type":"text","text":"..."} after setup |
| Force generation at a boundary | Include <flush> in a text message at a natural boundary |
| Finish generation | {"type":"end_of_stream"} for graceful completion, or close the socket to interrupt |
| Binary/audio response frames | {"type":"audio","audio":"base64..."} |
| Metadata response | ready message plus request_id |
| Warning response | error message for terminal errors; validate chunking and limits in your app |
Gradium messages
TTS Feature Mapping
| Existing feature | Gradium equivalent | Migration note |
|---|---|---|
Speak REST /v1/speak | POST /api/post/speech/tts | Use only_audio: true for raw audio responses. |
Speak WebSocket /v1/speak | wss://api.gradium.ai/api/speech/tts | Send setup before text. |
model=aura-* | voice_id plus optional model_name | Choose a Gradium voice from the voice library or your custom voice. |
Output encoding / sample_rate | output_format | Use pcm, pcm_16000, ulaw_8000, wav, opus, etc. |
Flush command | <flush> text tag | Flush sparingly, usually after LLM sentence or answer boundaries. |
Close command | end_of_stream or close socket | Use end_of_stream for graceful completion. |
| TTS metadata | ready.request_id, ready.sample_rate, ready.frame_size | Log request_id for debugging. |
Adapter Checklist
- Replace provider URLs with matching Gradium endpoints.
- Change auth to
x-api-key, or use browser-safe tokens for client apps. - For STT, map query params into
setupandjson_config. - For STT WebSocket, wrap audio bytes as base64 JSON messages.
- For TTS, replace provider voice/model names with a Gradium
voice_id. - For TTS WebSocket, send a Gradium
setupmessage before text. - Replace provider finality fields with Gradium
end_text,step, andflushedhandling. - Keep provider-specific features such as diarization behind adapter capability checks.
Next steps
Gradium STT WebSocket guide
Real-time audio streaming, semantic VAD, and flush.
Turn-taking recipe
Replace endpointing and speech-final logic with Gradium VAD.
Gradium TTS WebSocket guide
Streaming text-to-speech over WebSocket.
WebSocket Lifecycle
Setup, ready, input, flush, end, multiplexing, and errors.