Flow
- Ask the user for microphone permission.
- Capture audio with the Web Audio API.
- Convert float samples to 16-bit mono PCM.
- Send base64 audio chunks over
wss://api.gradium.ai/api/speech/asr. - Render
textmessages and usestepmessages for turn-taking.
Browser Client
ScriptProcessorNode is easy to read but deprecated. For production,
prefer an AudioWorklet so audio capture stays reliable under UI load.
Audio Format Notes
input_format: "pcm"means 24 kHz, 16-bit signed mono PCM.- If your browser audio graph runs at 48 kHz, either resample to 24 kHz
or send
input_format: "pcm_48000". - Send small chunks, around 80-100 ms, to keep latency low.
- Do not send compressed browser formats unless you explicitly set a
supported Gradium input format such as
opus.
Related
Speech-to-Text WebSocket
Message types, VAD, flushing, and direct WebSocket examples.
Turn-taking with VAD
Use semantic VAD to decide when a speaker has finished.