Speech-to-Text

STT POST Endpoint

Use this HTTP POST endpoint for simple, one-shot speech-to-text transcription. Send the entire audio payload in the request body and receive a stream of newline-delimited JSON (NDJSON) messages with the transcription results.

Endpoint URL:

https://api.gradium.ai/api/post/speech/asr

Authentication: Include your API key in the request header:

Header: x-api-key: your_api_key

Quick Example

curl -L -X POST https://api.gradium.ai/api/post/speech/asr \
  -H "x-api-key: your_api_key" \
  -H "Content-Type: audio/wav" \
  --data-binary @input.wav

With a language hint:

curl -L -X POST "https://api.gradium.ai/api/post/speech/asr?json_config=%7B%22language%22%3A%22en%22%7D" \
  -H "x-api-key: your_api_key" \
  -H "Content-Type: audio/wav" \
  --data-binary @input.wav

Request Format

Method: POST Body: Raw audio bytes (the full file).

The input audio format is selected from the Content-Type header:

Content-Type	Audio Format
`audio/wav` (default if header is missing)	WAV (PCM data, 16/24/32-bit)
`audio/pcm`	Raw PCM, 24 kHz, 16-bit signed little-endian, mono
`audio/ogg` or `audio/opus`	Ogg-wrapped Opus

Query Parameters:

model (string, optional): The Speech-to-Text model to use (default: default).
input_format (string, optional): Override the input format detected from Content-Type. One of wav, pcm, opus.
json_config (string, optional): JSON-encoded model configuration. Common use case: pass a language hint, e.g. {"language": "en"}. The value should be URL-encoded when used as a query parameter.

Response Format

Content-Type: application/x-ndjson

The response body is a stream of newline-delimited JSON messages. Each line is a separate JSON object. Possible message types:

`text` — transcribed text segment

{"type": "text", "text": "Hello world", "start_s": 0.5, "stream_id": 0}

text (string): Transcribed text.
start_s (float): Start time of the segment in seconds.
stream_id (integer): Stream identifier when multiple text streams are in use (0 in single-stream transcription).

`end_text` — segment boundary

{"type": "end_text", "stop_s": 2.5, "stream_id": 0}

stop_s (float): End time of the previous text segment in seconds.
stream_id (integer): Stream identifier.

`error` — server-side error

{"type": "error", "message": "Error description"}

If the transcription pipeline fails, the server emits an error message and stops the stream.

Reading the Stream

The response is streamed: read the body line-by-line and parse each line as JSON. The body closes when transcription is complete.

import json
import requests

with open("input.wav", "rb") as f:
    audio = f.read()

with requests.post(
    "https://api.gradium.ai/api/post/speech/asr",
    data=audio,
    headers={
        "x-api-key": "your_api_key",
        "Content-Type": "audio/wav",
    },
    stream=True,
) as resp:
    resp.raise_for_status()
    transcript = []
    for line in resp.iter_lines(decode_unicode=True):
        if not line:
            continue
        msg = json.loads(line)
        if msg["type"] == "text":
            transcript.append(msg["text"])
        elif msg["type"] == "error":
            raise RuntimeError(msg["message"])
print(" ".join(transcript))

Error Handling

If the request fails before the response stream has started, the server responds with HTTP 500 and a plain-text body. Two body shapes occur:

Upstream errors (with a numeric code) such as authentication failures or worker-level rejections:
```
error from server <code>: <reason>
```
For example, a revoked or expired API key returns error from server 1008: API key is revoked or expired.
Proxy-level rejections (e.g. unsupported Content-Type, malformed request body) come back as raw error strings without the error from server prefix:
```
unsupported content type for SST 'audio/mpeg'
```

In both cases the body is plain text (not JSON). Errors that occur after the NDJSON stream has started are surfaced as {"type": "error", "message": "..."} lines within the stream rather than as a different HTTP status.

When to Use POST vs WebSocket

The POST endpoint is ideal for one-shot transcription of complete audio files already on disk or in memory. The audio is uploaded in a single request, transcription runs, and the results are streamed back as NDJSON.

Use the WebSocket endpoint instead when you need to:

Stream audio as it is being captured (microphone, telephony).
Receive partial transcripts and Voice Activity Detection (VAD) events in real time for turn-taking.
Send a flush message to force the model to emit buffered text on demand.

POST

post

speech

asr

cURL

curl -L -X POST https://api.gradium.ai/api/post/speech/asr \
  -H "x-api-key: your_api_key" \
  -H "Content-Type: audio/wav" \
  --data-binary @input.wav

"<string>"

Headers

x-api-key

string

required

Your Gradium API key

Content-Type

enum<string>

Format of the audio in the request body. Defaults to audio/wav when omitted.

Available options:

audio/wav,

audio/pcm,

audio/ogg,

audio/opus

Query Parameters

model

string

default:default

Speech-to-Text model name.

input_format

enum<string>

Overrides the audio format detected from Content-Type.

Available options:

wav,

pcm,

opus

json_config

string

JSON-encoded model configuration. Example: {"language": "en"}

Body

WAV audio file.

Response

NDJSON stream of transcription messages.

Newline-delimited JSON messages: text, end_text, or error. The body closes when transcription is complete.

Create VoiceCreate a new voice for an organization with audio file upload.

cURL

curl -L -X POST https://api.gradium.ai/api/post/speech/asr \
  -H "x-api-key: your_api_key" \
  -H "Content-Type: audio/wav" \
  --data-binary @input.wav

"<string>"

Overview

Text-to-Speech

Speech-to-Text

Voices

Pronunciations

Metering

STT POST Endpoint

Quick Example

Request Format

Response Format

`text` — transcribed text segment

`end_text` — segment boundary

`error` — server-side error

Reading the Stream

Error Handling

When to Use POST vs WebSocket

Headers

Query Parameters

Body

Response

Overview

Text-to-Speech

Speech-to-Text

Voices

Pronunciations

Metering

Documentation Index

Headers

Query Parameters

Body

Response