Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.gradium.ai/llms.txt

Use this file to discover all available pages before exploring further.

Gradium exposes Speech-to-Text over two transports. They share the same models; pick the transport that matches your audio source and latency needs.

WebSocket vs REST

If your audio is…UseWhy
Live (microphone, telephony, agent loop) and you need sub-second turn-takingWebSocketLowest latency. Push audio as it’s captured, get text and VAD signals as they’re produced. React to end-of-turn in real time.
A complete file you already have on disk or in memoryRESTOne HTTP POST with the audio in the body. No connection to manage; the server streams NDJSON results back as the response body.
In hand but you want VAD signals or to react to flush eventsWebSocketUse stt_stream (pull-based) over the WebSocket. Same low latency, no manual connection management.
If you’re transcribing pre-recorded audio and don’t need VAD, REST is the simpler path. Move to WebSocket when you need live audio, turn-taking, or in-stream flush.

What both transports share

  • Models: same model_name works on both.
  • Input formats: PCM (multiple sample rates), WAV, Opus, mu-law, A-law.
  • Tunable options: temp, language, padding_bonus, delay_in_frames via json_config. See Transcription Settings.

What’s transport-specific

  • WebSocket-only: real-time VAD step messages, in-stream send_flush() for forced processing, setup-message stream controls (send_setup_on_start, wait_for_ready_on_start). See WebSocket Stream Options.
  • REST-only: send the full audio as the request body; receive NDJSON over a streaming response.

Next steps

Use the WebSocket API

Streaming audio in, transcripts and VAD signals out, with flush control.

Use the REST API

One-shot transcription of a complete audio file.

Transcription settings

language, temp, padding_bonus, delay_in_frames.

Errors

Error contracts across REST, WebSocket, and streamed responses.