Introduction

This documentation covers the Gradium API. This API exposes our Text-To-Speech and Speech-To-Text models, which offers low-latency, high-quality & natural sounding output and best in class accuracy. For issues, questions, or feature requests, please contact us at support@gradium.ai

Features

Multilingual: We currently support five languages: English (en), French (fr), German (de), Spanish (es) and Portuguese (pt) for our Text-To-Speech and Speech-To-Text with more languages to come.
Low-latency: Our servers are based in Europe and in the US, with our expected time-to-first-token is below 300ms when streaming.
Voice selection: We provide a voice library, with multiple voices to choose from in different languages. You can also clone voices instantaneously using a 10” voice sample.
Semantic VAD: STT streams include step messages every 80 ms with inactivity probabilities across future horizons, so voice agents can decide when a speaker has actually finished.
Adaptive delay control: Tune delay_in_frames for the latency/quality tradeoff and use flush at turn boundaries to process buffered audio without waiting for natural silence.

Installation

Get started with the Gradium Python SDK

Text-to-Speech

Convert text to natural-sounding speech

Speech-to-Text

Transcribe audio to text in real-time

API Reference

Explore the full API reference

InstallationInstall the Gradium Python SDK and get started

Getting Started

Text-to-Speech

Speech-to-Text

Speech-to-Speech

Shared

Real-time Recipes

Migrations

Voices

Resources

Features

Installation

Text-to-Speech

Speech-to-Text

API Reference

​Features

Installation

Text-to-Speech

Speech-to-Text

API Reference

Features