Skip to main content
Keep your existing adapter shape: audio in still produces transcript events, and text in still produces audio bytes or chunks. Move the implementation to Gradium endpoints, x-api-key auth, Gradium voice_id values, and Gradium message types. For realtime STT, Gradium adds semantic VAD, adaptive delay, and explicit flush handling for turn-taking.

Endpoint Swap

FlowExisting integrationGradium
Pre-recorded STTPOST https://api.deepgram.com/v1/listenPOST https://api.gradium.ai/api/post/speech/asr
Streaming STTwss://api.deepgram.com/v1/listenwss://api.gradium.ai/api/speech/asr
One-shot TTSPOST https://api.deepgram.com/v1/speakPOST https://api.gradium.ai/api/post/speech/tts
Streaming TTSwss://api.deepgram.com/v1/speakwss://api.gradium.ai/api/speech/tts

Auth Mapping

Existing authGradium
Authorization: Token $DEEPGRAM_API_KEYx-api-key: $GRADIUM_API_KEY
Authorization: Bearer <token>Browser/mobile clients should use Gradium temporary WebSocket tokens. See Browser WebSockets.
API key in WebSocket headersx-api-key header, or ?token=... for short-lived browser tokens.

Gradium Speech-to-Text POST

For pre-recorded audio, keep sending the audio bytes in the request body and switch the URL. Gradium streams newline-delimited JSON messages back.
Gradium
curl -L -X POST https://api.gradium.ai/api/post/speech/asr \
  -H "x-api-key: $GRADIUM_API_KEY" \
  -H "Content-Type: audio/wav" \
  --data-binary @input.wav
Gradium’s response is NDJSON. Build the transcript by collecting text messages and pairing them with end_text when you need segment end timestamps.

Speech-to-Text WebSocket

Gradium’s direct WebSocket protocol sends audio in JSON messages with base64 payloads:
Gradium messages
{"type":"setup","model_name":"default","input_format":"pcm","json_config":{"language":"en"}}
{"type":"audio","audio":"base64_encoded_audio"}
{"type":"end_of_stream"}
Existing STT conceptGradium STT message
WebSocket connectConnect to wss://api.gradium.ai/api/speech/asr
Model and language settingssetup.model_name, setup.json_config.language
Audio frame{"type":"audio","audio":"base64_encoded_audio"}
Transcript result{"type":"text","text":"...","start_s":...}
Final segment timing{"type":"end_text","stop_s":...}
Endpointing / turn signalstep.vad[*].inactivity_prob
Close/finalize stream{"type":"end_of_stream"}
Force pending output{"type":"flush","flush_id":1}
Set input_format to match the audio you send: pcm, wav, opus, ulaw_8000, alaw_8000, or another supported Gradium format.

STT Feature Mapping

Existing featureGradium equivalentMigration note
model=nova-*model_name: "default" unless given another Gradium model aliasModel names are provider-specific.
language=en-USjson_config: {"language": "en"}Gradium uses short language codes: en, fr, de, es, pt.
Interim transcript eventsStreaming text plus end_text segment finalizationGradium emits text segments and separate end timestamps.
speech_final / endpointingSemantic VAD step messagesUse inactivity probabilities across horizons; see Turn-Taking.
utterance_end_msVAD thresholding plus send_flush()Implement turn-end policy in your app.
encoding, sample_rateinput_format such as pcm_16000, ulaw_8000Pick the exact Gradium format for your audio bytes.
punctuate, smart_formatBuilt into model behavior where availableThere is no one-to-one request flag.
diarize, channels, alternativesNo direct public Gradium equivalent in these docsKeep provider-specific code behind an adapter if you rely on these.
Word-level timestampsSegment timestampsGradium currently documents segment-level timestamps.

Gradium Text-to-Speech POST

Gradium uses voice_id in the JSON body and returns raw audio bytes when only_audio is true, so your existing file-write or playback code can usually remain unchanged.
Gradium
curl -L -X POST https://api.gradium.ai/api/post/speech/tts \
  -H "x-api-key: $GRADIUM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from Gradium.",
    "voice_id": "YTpq7expH9539ERJ",
    "output_format": "wav",
    "only_audio": true
  }' \
  > output.wav

Text-to-Speech WebSocket

For streaming TTS, use Gradium setup, text, and end_of_stream messages:
Existing TTS behaviorGradium TTS message
Send text{"type":"text","text":"..."} after setup
Force generation at a boundaryInclude <flush> in a text message at a natural boundary
Finish generation{"type":"end_of_stream"} for graceful completion, or close the socket to interrupt
Binary/audio response frames{"type":"audio","audio":"base64..."}
Metadata responseready message plus request_id
Warning responseerror message for terminal errors; validate chunking and limits in your app
Gradium messages
{"type":"setup","voice_id":"YTpq7expH9539ERJ","model_name":"default","output_format":"pcm"}
{"type":"text","text":"Hello from Gradium. <flush>"}
{"type":"end_of_stream"}

TTS Feature Mapping

Existing featureGradium equivalentMigration note
Speak REST /v1/speakPOST /api/post/speech/ttsUse only_audio: true for raw audio responses.
Speak WebSocket /v1/speakwss://api.gradium.ai/api/speech/ttsSend setup before text.
model=aura-*voice_id plus optional model_nameChoose a Gradium voice from the voice library or your custom voice.
Output encoding / sample_rateoutput_formatUse pcm, pcm_16000, ulaw_8000, wav, opus, etc.
Flush command<flush> text tagFlush sparingly, usually after LLM sentence or answer boundaries.
Close commandend_of_stream or close socketUse end_of_stream for graceful completion.
TTS metadataready.request_id, ready.sample_rate, ready.frame_sizeLog request_id for debugging.

Adapter Checklist

  • Replace provider URLs with matching Gradium endpoints.
  • Change auth to x-api-key, or use browser-safe tokens for client apps.
  • For STT, map query params into setup and json_config.
  • For STT WebSocket, wrap audio bytes as base64 JSON messages.
  • For TTS, replace provider voice/model names with a Gradium voice_id.
  • For TTS WebSocket, send a Gradium setup message before text.
  • Replace provider finality fields with Gradium end_text, step, and flushed handling.
  • Keep provider-specific features such as diarization behind adapter capability checks.

Next steps

Gradium STT WebSocket guide

Real-time audio streaming, semantic VAD, and flush.

Turn-taking recipe

Replace endpointing and speech-final logic with Gradium VAD.

Gradium TTS WebSocket guide

Streaming text-to-speech over WebSocket.

WebSocket Lifecycle

Setup, ready, input, flush, end, multiplexing, and errors.