Skip to main content
This documentation covers the Gradium API. This API exposes our Text-To-Speech and Speech-To-Text models, which offers low-latency, high-quality & natural sounding output and best in class accuracy. For issues, questions, or feature requests, please contact us at support@gradium.ai

Features

  • Multilingual: We currently support five languages: English (en), French (fr), German (de), Spanish (es) and Portuguese (pt) for our Text-To-Speech and Speech-To-Text with more languages to come.
  • Low-latency: Our servers are based in Europe and in the US, with our expected time-to-first-token is below 300ms when streaming.
  • Voice selection: We provide a voice library, with multiple voices to choose from in different languages. You can also clone voices instantaneously using a 10” voice sample.
  • Semantic VAD: STT streams include step messages every 80 ms with inactivity probabilities across future horizons, so voice agents can decide when a speaker has actually finished.
  • Adaptive delay control: Tune delay_in_frames for the latency/quality tradeoff and use flush at turn boundaries to process buffered audio without waiting for natural silence.

Installation

Get started with the Gradium Python SDK

Text-to-Speech

Convert text to natural-sounding speech

Speech-to-Text

Transcribe audio to text in real-time

API Reference

Explore the full API reference