S2S WebSocket Stream

wscat -c "wss://api.gradium.ai/api/speech/s2s" \
  -H "x-api-key: YOUR_API_KEY"

Lifecycle

{"type":"setup","model_name":"s2s-translate","stt_model_name":"stt-translate","tts_model_name":"default","input_format":"pcm","output_format":"pcm","voice_id":"YTpq7expH9539ERJ","json_config":{"target_language":"en"}}
{"type":"audio","audio":"base64_encoded_audio"}
{"type":"end_of_stream"}

The server responds with ready, then text and audio messages as available, and finally end_of_stream. The protocol combines the STT input side (audio in) with the TTS output side (text and audio out). See WebSocket Lifecycle for connection behavior, reusable sockets, browser tokens, and errors.

Client Messages

`setup`

Field	Type	Required	Description
`type`	string	Yes	Always `"setup"`.
`model_name`	string	Yes	Model alias. Use `"s2s-translate"` for live translation, the only currently supported model.
`stt_model_name`	string	No	Speech-to-text model used to transcribe the input. Set to `"stt-translate"` for live translation.
`tts_model_name`	string	No	Text-to-speech model used to synthesize the output. Set to `"default"` for live translation.
`input_format`	string	Yes	`pcm`, `wav`, `opus`, `ulaw_8000`, `alaw_8000`, or explicit PCM rates such as `pcm_24000`.
`output_format`	string	Yes	`wav`, `pcm`, `opus`, `ulaw_8000`, etc.
`voice_id`	string	No	Voice used for the synthesized output. See Voices.
`json_config`	object or string	No	Advanced settings. Set `target_language` to translate the speech; omit it to keep the original language.
`client_req_id`	string	No	Correlates multiplexed requests.
`close_ws_on_eos`	boolean	No	Defaults to `true`; set `false` to keep the socket open.

When using pcm as input format, the input is expected to be at 24kHz using 16 bits little-endian samples. When using pcm as output format, the output is expected to be at 48kHz using 16 bits little-endian samples.

`audio`

Field	Type	Required	Description
`type`	string	Yes	Always `"audio"`.
`audio`	string	Yes	Base64-encoded input audio chunk.
`client_req_id`	string	No	Required when routing a multiplexed request.

`end_of_stream`

Field	Type	Required	Description
`type`	string	Yes	Always `"end_of_stream"`.
`client_req_id`	string	No	End the matching multiplexed request.

Server Messages

`ready`

Field	Type	Description
`type`	string	Always `"ready"`.
`request_id`	string	Gradium request ID for logging and support.
`model_name`	string	Requested model alias.
`sample_rate`	integer	Output sample rate in Hz.
`frame_size`	integer	Output frame size in samples.
`client_req_id`	string	Present for multiplexed requests.

`text`

Field	Type	Description
`type`	string	Always `"text"`.
`text`	string	Transcribed (and translated, if `target_language` is set) text segment.
`start_s`	number	Segment start time in seconds.
`stop_s`	number	Segment stop time in seconds.
`stream_id`	integer	Stream identifier, when present.
`client_req_id`	string	Present for multiplexed requests.

`audio`

Field	Type	Description
`type`	string	Always `"audio"`.
`audio`	string	Base64-encoded output audio chunk.
`start_s`	number	Chunk start time in seconds.
`stop_s`	number	Chunk stop time in seconds.
`stream_id`	integer	Stream identifier, when present.
`client_req_id`	string	Present for multiplexed requests.

Terminal messages

Type	Description
`end_of_stream`	The request is complete.
`error`	Terminal error message; the socket closes after the error.

Error

{"type":"error","message":"Error description","code":1008}

Headers

x-api-key

string

required

Your Gradium API key

Response

101

WebSocket connection established

Overview

Text-to-Speech

Speech-to-Text

Speech-to-Speech

Voices

Pronunciations

Metering

Lifecycle

Client Messages

`setup`

`audio`

`end_of_stream`

Server Messages

`ready`

`text`

`audio`

Terminal messages

Headers

Response

​Lifecycle

​Client Messages

​setup

​audio

​end_of_stream

​Server Messages

​ready

​text

​audio

​Terminal messages

Headers

Response

Lifecycle

Client Messages

`setup`

`audio`

`end_of_stream`

Server Messages

`ready`

`text`

`audio`

Terminal messages