Speech-to-Text
STT WebSocket Stream
Stream audio to Gradium speech-to-text over WebSocket for real-time transcription.
Lifecycle
ready, then text, end_text, step, and
flushed messages as available, and finally end_of_stream. See
WebSocket Lifecycle for connection
behavior, reusable sockets, browser tokens, and errors.
Client Messages
setup
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Always "setup". |
model_name | string | No | Model alias, defaults to "default". |
input_format | string | No | pcm, wav, opus, ulaw_8000, alaw_8000, or explicit PCM rates such as pcm_16000. Defaults to wav. |
json_config | object or string | No | Advanced STT settings. See Transcription Settings. |
client_req_id | string | No | Correlates multiplexed requests. |
close_ws_on_eos | boolean | No | Defaults to true; set false to keep the socket open. |
retry_for_s | number | No | Optional setup retry window in seconds. |
audio
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Always "audio". |
audio | string | Yes | Base64-encoded audio chunk. |
client_req_id | string | No | Required when routing a multiplexed request. |
flush
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Always "flush". |
flush_id | integer | Yes | Echoed in the matching flushed response. |
client_req_id | string | No | Required when routing a multiplexed request. |
end_of_stream
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Always "end_of_stream". |
client_req_id | string | No | End the matching multiplexed request. |
Server Messages
ready
| Field | Type | Description |
|---|---|---|
type | string | Always "ready". |
request_id | string | Gradium request ID for logging and support. |
model_name | string | Requested model alias. |
sample_rate | integer | Input sample rate after setup. |
frame_size | integer | Frame size in samples. |
delay_in_frames | integer | Model delay, in 80 ms frames. |
text_stream_names | string[] | Named text streams, when present. |
client_req_id | string | Present for multiplexed requests. |
text
| Field | Type | Description |
|---|---|---|
type | string | Always "text". |
text | string | Transcribed text segment. |
start_s | number | Segment start time in seconds. |
stream_id | integer | Stream identifier, when present. |
client_req_id | string | Present for multiplexed requests. |
end_text
| Field | Type | Description |
|---|---|---|
type | string | Always "end_text". |
stop_s | number | Stop time for the previous text segment. |
stream_id | integer | Stream identifier, when present. |
client_req_id | string | Present for multiplexed requests. |
step
| Field | Type | Description |
|---|---|---|
type | string | "step" or legacy "vad". |
vad | object[] | Horizon predictions with horizon_s and inactivity_prob. |
step_idx | integer | Step index. |
step_duration_s | number | Step duration in seconds, usually 0.08. |
total_duration_s | number | Audio duration processed so far. |
client_req_id | string | Present for multiplexed requests. |
flushed
| Field | Type | Description |
|---|---|---|
type | string | Always "flushed". |
flush_id | integer | The flush_id from the matching request. |
client_req_id | string | Present for multiplexed requests. |
Terminal messages
| Type | Description |
|---|---|
end_of_stream | The request is complete. |
error | Terminal error message; the socket closes after the error. |
Error
Headers
Your Gradium API key
Response
101
WebSocket connection established