Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.gradium.ai/llms.txt

Use this file to discover all available pages before exploring further.

Use the REST endpoint when you have a finished text block and want a single HTTP request, no WebSocket to manage. The response can either be raw audio bytes (for direct file writes) or a JSON stream that mirrors the WebSocket protocol.

Streaming use case?

For low-latency synthesis of long or generated text, streaming TTS via the SDK gives you audio chunks as they’re produced.

Quickstart

only_audio=true returns the raw audio bytes directly, easiest for “text in, file out”.
curl -L -X POST https://api.gradium.ai/api/post/speech/tts \
  -H "x-api-key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, world!", "voice_id": "YTpq7expH9539ERJ", "output_format": "wav", "only_audio": true}' \
  > output.wav

Response modes

The only_audio field in the request body picks one of two response shapes:
  • only_audio: true: the response body is the raw audio in the requested output_format (WAV, PCM, Opus, …). Save it directly to a file or pipe it to a player. The Content-Type reflects the format (audio/wav, audio/ogg, audio/pcm).
  • only_audio: false (or omitted): the response is a JSON stream using the same message format as the WebSocket endpoint, including audio (base64), text (with timestamps), and error. Read the body line-by-line until it closes.

Streaming the JSON response

Pass only_audio: false and read the body as it arrives. With cURL, the -N (--no-buffer) flag prints each line as soon as the server sends it.
curl -N -L -X POST https://api.gradium.ai/api/post/speech/tts \
  -H "x-api-key: your_api_key" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello, world!", "voice_id": "YTpq7expH9539ERJ", "output_format": "wav", "only_audio": false}'
For the full request schema, supported output formats, and error shapes, see the TTS POST Endpoint reference.

Next steps

TTS POST API reference

Full request body schema, output formats, error contracts.

Streaming with the SDK

Chunked audio output, custom voices, flush, timestamps.