Speech-to-Text Overview

Gradium exposes Speech-to-Text over two transports. They share the same models; pick the transport that matches your audio source and latency needs.

WebSocket vs REST

If your audio is…	Use	Why
Live (microphone, telephony, agent loop) and you need sub-second turn-taking	WebSocket	Lowest latency. Push audio as it’s captured, get text and VAD signals as they’re produced. React to end-of-turn in real time.
A complete file you already have on disk or in memory	REST	One HTTP POST with the audio in the body. No connection to manage; the server streams NDJSON results back as the response body.
In hand but you want VAD signals or to react to flush events	WebSocket	Use `stt_stream` (pull-based) over the WebSocket. Same low latency, no manual connection management.

If you’re transcribing pre-recorded audio and don’t need VAD, REST is the simpler path. Move to WebSocket when you need live audio, turn-taking, or in-stream flush.

Models: same model_name works on both.
Input formats: PCM (multiple sample rates), WAV, Opus, mu-law, A-law.
Tunable options: temp, language, padding_bonus, delay_in_frames via json_config. See Transcription Settings.

What’s transport-specific

WebSocket-only: real-time VAD step messages, in-stream send_flush() for forced processing, setup-message stream controls (send_setup_on_start, wait_for_ready_on_start). See WebSocket Stream Options.
REST-only: send the full audio as the request body; receive NDJSON over a streaming response.

Next steps

Use the WebSocket API

Streaming audio in, transcripts and VAD signals out, with flush control.

Use the REST API

One-shot transcription of a complete audio file.

Transcription settings

language, temp, padding_bonus, delay_in_frames.

Errors

Error contracts across REST, WebSocket, and streamed responses.

Getting Started

Text-to-Speech

Speech-to-Text

Shared

Voices

Resources

Speech-to-Text Overview

WebSocket vs REST

What’s transport-specific

Next steps

Use the WebSocket API

Use the REST API

Transcription settings

Errors

Getting Started

Text-to-Speech

Speech-to-Text

Shared

Voices

Resources

Documentation Index

​WebSocket vs REST

​What both transports share

​What’s transport-specific

​Next steps

Use the WebSocket API

Use the REST API

Transcription settings

Errors

WebSocket vs REST

What both transports share

What’s transport-specific

Next steps