Skip to main content
wscat -c "wss://api.gradium.ai/api/speech/s2s" \
  -H "x-api-key: YOUR_API_KEY"
wscat -c "wss://api.gradium.ai/api/speech/s2s" \
  -H "x-api-key: YOUR_API_KEY"

Lifecycle

{"type":"setup","model_name":"default","input_format":"pcm","output_format":"pcm","voice_id":"YTpq7expH9539ERJ","json_config":{"target_language":"en"}}
{"type":"audio","audio":"base64_encoded_audio"}
{"type":"end_of_stream"}
The server responds with ready, then text and audio messages as available, and finally end_of_stream. The protocol combines the STT input side (audio in) with the TTS output side (text and audio out). See WebSocket Lifecycle for connection behavior, reusable sockets, browser tokens, and errors.

Client Messages

setup

FieldTypeRequiredDescription
typestringYesAlways "setup".
model_namestringNoModel alias, defaults to "default".
stt_model_namestringNoSpeech-to-text model used to transcribe the input.
tts_model_namestringNoText-to-speech model used to synthesize the output.
input_formatstringNopcm, wav, opus, ulaw_8000, alaw_8000, or explicit PCM rates such as pcm_16000. Defaults to wav.
output_formatstringNowav, pcm, opus, ulaw_8000, etc. Defaults to wav.
voice_idstringNoVoice used for the synthesized output. See Voices.
json_configobject or stringNoAdvanced settings. Set target_language to translate the speech; omit it to keep the original language.
client_req_idstringNoCorrelates multiplexed requests.
close_ws_on_eosbooleanNoDefaults to true; set false to keep the socket open.

audio

FieldTypeRequiredDescription
typestringYesAlways "audio".
audiostringYesBase64-encoded input audio chunk.
client_req_idstringNoRequired when routing a multiplexed request.

end_of_stream

FieldTypeRequiredDescription
typestringYesAlways "end_of_stream".
client_req_idstringNoEnd the matching multiplexed request.

Server Messages

ready

FieldTypeDescription
typestringAlways "ready".
request_idstringGradium request ID for logging and support.
model_namestringRequested model alias.
sample_rateintegerOutput sample rate in Hz.
frame_sizeintegerOutput frame size in samples.
client_req_idstringPresent for multiplexed requests.

text

FieldTypeDescription
typestringAlways "text".
textstringTranscribed (and translated, if target_language is set) text segment.
start_snumberSegment start time in seconds.
stop_snumberSegment stop time in seconds.
stream_idintegerStream identifier, when present.
client_req_idstringPresent for multiplexed requests.

audio

FieldTypeDescription
typestringAlways "audio".
audiostringBase64-encoded output audio chunk.
start_snumberChunk start time in seconds.
stop_snumberChunk stop time in seconds.
stream_idintegerStream identifier, when present.
client_req_idstringPresent for multiplexed requests.

Terminal messages

TypeDescription
end_of_streamThe request is complete.
errorTerminal error message; the socket closes after the error.
Error
{"type":"error","message":"Error description","code":1008}

Headers

x-api-key
string
required

Your Gradium API key

Response

101

WebSocket connection established