Telephony providers commonly send 8 kHz mono audio encoded as mu-law or A-law. Gradium supports those formats directly for STT and can produce telephony-friendly TTS output.Documentation Index
Fetch the complete documentation index at: https://docs.gradium.ai/llms.txt
Use this file to discover all available pages before exploring further.
Recommended Formats
| Use case | Gradium format | Notes |
|---|---|---|
| Incoming PSTN audio to STT | ulaw_8000 or alaw_8000 | Use the encoding your telephony provider sends. |
| Incoming linear PCM to STT | pcm_8000 or pcm_16000 | 16-bit signed little-endian mono PCM. |
| Outgoing audio to PSTN | ulaw_8000 or alaw_8000 | Avoid resampling in your telephony bridge when possible. |
| Higher quality app audio | pcm_24000, pcm_48000, wav, or opus | Use outside PSTN constraints. |
STT WebSocket Setup
TTS WebSocket Setup
audio messages contain base64-encoded mu-law chunks that can be
forwarded to a telephony media stream.
Chunk Size Guidance
Many telephony media streams send 20 ms frames. Gradium accepts small chunks, but batching to roughly 80 ms can reduce message overhead while keeping latency low:| Format | 20 ms payload | 80 ms payload |
|---|---|---|
ulaw_8000 / alaw_8000 | 160 bytes | 640 bytes |
pcm_8000 | 320 bytes | 1280 bytes |
pcm_16000 | 640 bytes | 2560 bytes |
Bridge Checklist
- Preserve the provider’s stream/session ID in your own logs.
- Use one Gradium STT session per caller audio stream.
- Set
input_formatto the actual bytes you forward. - For TTS, request an output format your provider can play directly.
- Use STT
stepmessages or your provider’s VAD to decide when to send the user’s turn to an agent. - Close the WebSocket and start a new session if the call leg changes format.
Related
Speech-to-Text WebSocket
Real-time transcription with VAD and flush.
Text-to-Speech WebSocket
Stream generated audio back to your telephony bridge.