Speech-to-Text (REST)

Use the REST endpoint when you have a complete audio file in hand and want a single HTTP request, no WebSocket to manage. The server reads the full body, transcribes it, and streams newline-delimited JSON (NDJSON) messages back as the body.

Streaming use case?

For live audio (microphone, telephony) or to react to VAD and flush events in real time, use the SDK streaming guide instead.

Quickstart

import json
import requests

with open("input.wav", "rb") as f:
    audio = f.read()

with requests.post(
    "https://api.gradium.ai/api/post/speech/asr",
    data=audio,
    headers={
        "x-api-key": "your_api_key",
        "Content-Type": "audio/wav",
    },
    stream=True,
) as resp:
    resp.raise_for_status()
    transcript = []
    for line in resp.iter_lines(decode_unicode=True):
        if not line:
            continue
        msg = json.loads(line)
        if msg["type"] == "text":
            transcript.append(msg["text"])
print(" ".join(transcript))

Response format

The response is Content-Type: application/x-ndjson. Each line is a JSON object with a type field, text and end_text are the ones you’ll typically care about; error may appear if the pipeline fails mid-stream. The body closes when transcription is complete. For the full message schema, request body, query parameters, and error shapes, see the STT POST Endpoint reference.

Passing `json_config`

Advanced options (temp, language, padding_bonus, delay_in_frames) are passed as a JSON-encoded string in the json_config query parameter. The example below sends a language hint:

# json_config = {"language":"en"}, URL-encoded as %7B%22language%22%3A%22en%22%7D
curl -L -X POST \
  'https://api.gradium.ai/api/post/speech/asr?json_config=%7B%22language%22%3A%22en%22%7D' \
  -H "x-api-key: your_api_key" \
  -H "Content-Type: audio/wav" \
  --data-binary @input.wav

The supported keys, allowed values, and validation rules are documented in Transcription Settings.

Getting Started

Text-to-Speech

Speech-to-Text

Shared

Voices

Resources

Speech-to-Text (REST)

Streaming use case?

Quickstart

Response format

Passing `json_config`

Next steps

STT POST API reference

Streaming with the SDK

Getting Started

Text-to-Speech

Speech-to-Text

Shared

Voices

Resources

Documentation Index

Streaming use case?

​Quickstart

​Response format

​Passing json_config

​Next steps

STT POST API reference

Streaming with the SDK

Quickstart

Response format

Passing `json_config`

Next steps