Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.gradium.ai/llms.txt

Use this file to discover all available pages before exploring further.

Use the REST endpoint when you have a complete audio file in hand and want a single HTTP request, no WebSocket to manage. The server reads the full body, transcribes it, and streams newline-delimited JSON (NDJSON) messages back as the body.

Streaming use case?

For live audio (microphone, telephony) or to react to VAD and flush events in real time, use the SDK streaming guide instead.

Quickstart

import json
import requests

with open("input.wav", "rb") as f:
    audio = f.read()

with requests.post(
    "https://api.gradium.ai/api/post/speech/asr",
    data=audio,
    headers={
        "x-api-key": "your_api_key",
        "Content-Type": "audio/wav",
    },
    stream=True,
) as resp:
    resp.raise_for_status()
    transcript = []
    for line in resp.iter_lines(decode_unicode=True):
        if not line:
            continue
        msg = json.loads(line)
        if msg["type"] == "text":
            transcript.append(msg["text"])
print(" ".join(transcript))

Response format

The response is Content-Type: application/x-ndjson. Each line is a JSON object with a type field, text and end_text are the ones you’ll typically care about; error may appear if the pipeline fails mid-stream. The body closes when transcription is complete. For the full message schema, request body, query parameters, and error shapes, see the STT POST Endpoint reference.

Passing json_config

Advanced options (temp, language, padding_bonus, delay_in_frames) are passed as a JSON-encoded string in the json_config query parameter. The example below sends a language hint:
# json_config = {"language":"en"}, URL-encoded as %7B%22language%22%3A%22en%22%7D
curl -L -X POST \
  'https://api.gradium.ai/api/post/speech/asr?json_config=%7B%22language%22%3A%22en%22%7D' \
  -H "x-api-key: your_api_key" \
  -H "Content-Type: audio/wav" \
  --data-binary @input.wav
The supported keys, allowed values, and validation rules are documented in Transcription Settings.

Next steps

STT POST API reference

Full request/response schema, query parameters, content types.

Streaming with the SDK

Low-latency, microphone-friendly transcription with VAD and flush.