Gradium exposes Speech-to-Text over two transports. They share the same models; pick the transport that matches your audio source and latency needs.Documentation Index
Fetch the complete documentation index at: https://docs.gradium.ai/llms.txt
Use this file to discover all available pages before exploring further.
WebSocket vs REST
| If your audio is… | Use | Why |
|---|---|---|
| Live (microphone, telephony, agent loop) and you need sub-second turn-taking | WebSocket | Lowest latency. Push audio as it’s captured, get text and VAD signals as they’re produced. React to end-of-turn in real time. |
| A complete file you already have on disk or in memory | REST | One HTTP POST with the audio in the body. No connection to manage; the server streams NDJSON results back as the response body. |
| In hand but you want VAD signals or to react to flush events | WebSocket | Use stt_stream (pull-based) over the WebSocket. Same low latency, no manual connection management. |
flush.
What both transports share
- Models: same
model_nameworks on both. - Input formats: PCM (multiple sample rates), WAV, Opus, mu-law, A-law.
- Tunable options:
temp,language,padding_bonus,delay_in_framesviajson_config. See Transcription Settings.
What’s transport-specific
- WebSocket-only: real-time VAD
stepmessages, in-streamsend_flush()for forced processing, setup-message stream controls (send_setup_on_start,wait_for_ready_on_start). See WebSocket Stream Options. - REST-only: send the full audio as the request body; receive NDJSON over a streaming response.
Next steps
Use the WebSocket API
Streaming audio in, transcripts and VAD signals out, with flush
control.
Use the REST API
One-shot transcription of a complete audio file.
Transcription settings
language, temp, padding_bonus, delay_in_frames.Errors
Error contracts across REST, WebSocket, and streamed responses.