Connect to this endpoint via WebSocket for real-time text-to-speech conversion with low latency audio streaming.
Connection URL:
For Europe
wss://eu.api.gradium.ai/api/speech/tts
For the USA
wss://us.api.gradium.ai/api/speech/tts
Authentication: Include your API key in the WebSocket connection header:
x-api-key: your_api_key| Direction | Message Type | Example |
|---|---|---|
| 🔵⬆️ Client→Server | Setup (first) | {"type": "setup", "voice_id": "YTpq7expH9539ERJ", "model_name": "default", "output_format": "wav"} |
| 🟢⬇️ Server→Client | Ready | {"type": "ready", "request_id": "uuid"} |
| 🔵⬆️ Client→Server | Text (stream) | {"type": "text", "text": "Hello, world!"} |
| 🟢⬇️ Server→Client | Audio (stream) | {"type": "audio", "audio": "base64..."} |
| 🟢⬇️ Server→Client | Text (stream) | {"type": "text", "text": "Hello", "start_s": 0.2, "stop_s": 0.6} |
| 🔵⬆️ Client→Server | EndOfStream | {"type": "end_of_stream"} |
| 🟢⬇️ Server→Client | AEndOfStream | {"type": "end_of_stream"} |
| 🔴⬇️ Server→Client | Error | {"type": "error", "message": "Error description", "code": 1008} |
Direction: Client → Server Format: JSON Object
{
"type": "setup",
"model_name": "default",
"voice_id": "YTpq7expH9539ERJ",
"output_format": "wav"
}
Fields:
type (string, required): Must be “setup”model_name (string, required): The TTS model to use (default: “default”)voice_id (string, required): Voice ID from the library (e.g., “YTpq7expH9539ERJ” for Emma’s voice) or custom voice IDoutput_format (string, required): Audio format - either “wav”, “pcm”, or “opus”.Important: This must be the very first message sent after connection. The server will close the connection if any other message is sent first.
Direction: Server → Client Format: JSON Object
{
"type": "ready",
"request_id": "550e8400-e29b-41d4-a716-446655440000"
}
Fields:
type (string): Will be “ready”request_id (string): Unique identifier for the sessionThis message is sent by the server after receiving the setup message, indicating that the connection is ready to receive text messages.
Direction: Client → Server Format: JSON Object
{
"type": "text",
"text": "Hello, world!"
}
Fields:
type (string, required): Must be “text”text (string, required): The text to be converted to speechSend text messages to be converted to speech. You can send multiple text messages in sequence. The server will stream audio back as it’s generated.
Direction: Server → Client Format: JSON Object
{
"type": "audio",
"audio": "base64_encoded_audio_data..."
}
Fields:
type (string): Will be “audio”audio (string): Base64-encoded audio data in the requested formatWhen using "pcm" output format, the audio will adhere to the following
specifications:
When using the "wav" output format, the audio chunks are in WAV format,
at 48kHz, 16-bit signed integer mono.
When using the "opus" output format, the audio chunks use the Opus codec
wrapped in an Ogg container.
Alternative output formats include "ulaw_8000", "alaw_8000", "pcm_8000",
"pcm_16000", and "pcm_24000".
Important: Multiple audio messages will be streamed for each text message. Continue receiving until you detect the end of speech or receive a new message type.
Direction: Server → Client Format: JSON Object
{
"type": "text",
"text": "Hello",
"start_s": 0.2,
"stop_s": 0.6
}
Fields:
type (string): Will be “text”text (string): The portion of text that has been generated into speechstart_s (float): Start time in seconds of this text segment in the audiostop_s (float): Stop time in seconds of this text segment in the audioThe server sends text messages back to indicate which parts of the input text have been processed into speech as well as the associated timestamps in the audio stream.
Direction: Client → Server and Server → Client Format: JSON Object
{
"type": "end_of_stream",
}
This message is sent by the client when it has submitted all the text that it
wants to be considered. The server will then send back all the remaining audio
until all the text has been processed, then an EndOfStream message, and then
closes the websocket connection.
When errors occur, the server sends an error message as JSON before closing the connection:
Error Message Format:
{
"type": "error",
"message": "Error description explaining what went wrong",
"code": 1008
}
Common Error Codes:
1008: Policy Violation (e.g., invalid API key, missing setup message)1011: Internal Server Error (unexpected server-side error)Your Gradium API key
WebSocket connection established