Use this HTTP POST endpoint for simple, one-shot speech-to-text transcription. Send the entire audio payload in the request body and receive a stream of newline-delimited JSON (NDJSON) messages with the transcription results.
Endpoint URL:
https://api.gradium.ai/api/post/speech/asr
Authentication: Include your API key in the request header:
x-api-key: your_api_keycurl -L -X POST https://api.gradium.ai/api/post/speech/asr \
-H "x-api-key: your_api_key" \
-H "Content-Type: audio/wav" \
--data-binary @input.wav
With a language hint:
curl -L -X POST "https://api.gradium.ai/api/post/speech/asr?json_config=%7B%22language%22%3A%22en%22%7D" \
-H "x-api-key: your_api_key" \
-H "Content-Type: audio/wav" \
--data-binary @input.wav
Method: POST Body: Raw audio bytes (the full file).
The input audio format is selected from the Content-Type header:
| Content-Type | Audio Format |
|---|---|
audio/wav (default if header is missing) | WAV (PCM data, 16/24/32-bit) |
audio/pcm | Raw PCM, 24 kHz, 16-bit signed little-endian, mono |
audio/ogg or audio/opus | Ogg-wrapped Opus |
Query Parameters:
model (string, optional): The Speech-to-Text model to use (default: default).input_format (string, optional): Override the input format detected from
Content-Type. One of wav, pcm, opus.json_config (string, optional): JSON-encoded model configuration. Common
use case: pass a language hint, e.g. {"language": "en"}. The value should
be URL-encoded when used as a query parameter.Content-Type: application/x-ndjson
The response body is a stream of newline-delimited JSON messages. Each line is a separate JSON object. Possible message types:
text — transcribed text segment{"type": "text", "text": "Hello world", "start_s": 0.5, "stream_id": 0}
text (string): Transcribed text.start_s (float): Start time of the segment in seconds.stream_id (integer): Stream identifier when multiple text streams are in
use (0 in single-stream transcription).end_text — segment boundary{"type": "end_text", "stop_s": 2.5, "stream_id": 0}
stop_s (float): End time of the previous text segment in seconds.stream_id (integer): Stream identifier.error — server-side error{"type": "error", "message": "Error description"}
If the transcription pipeline fails, the server emits an error message and
stops the stream.
The response is streamed: read the body line-by-line and parse each line as JSON. The body closes when transcription is complete.
import json
import requests
with open("input.wav", "rb") as f:
audio = f.read()
with requests.post(
"https://api.gradium.ai/api/post/speech/asr",
data=audio,
headers={
"x-api-key": "your_api_key",
"Content-Type": "audio/wav",
},
stream=True,
) as resp:
resp.raise_for_status()
transcript = []
for line in resp.iter_lines(decode_unicode=True):
if not line:
continue
msg = json.loads(line)
if msg["type"] == "text":
transcript.append(msg["text"])
elif msg["type"] == "error":
raise RuntimeError(msg["message"])
print(" ".join(transcript))
If the request fails before the response stream has started, the server
responds with HTTP 500 and a plain-text body. Two body shapes occur:
Upstream errors (with a numeric code) such as authentication failures or worker-level rejections:
error from server <code>: <reason>
For example, a revoked or expired API key returns
error from server 1008: API key is revoked or expired.
Proxy-level rejections (e.g. unsupported Content-Type, malformed
request body) come back as raw error strings without the error from server prefix:
unsupported content type for SST 'audio/mpeg'
In both cases the body is plain text (not JSON). Errors that occur
after the NDJSON stream has started are surfaced as
{"type": "error", "message": "..."} lines within the stream rather
than as a different HTTP status.
The POST endpoint is ideal for one-shot transcription of complete audio files already on disk or in memory. The audio is uploaded in a single request, transcription runs, and the results are streamed back as NDJSON.
Use the WebSocket endpoint instead when you need to:
flush message to force the model to emit buffered text on demand.Documentation Index
Fetch the complete documentation index at: https://docs.gradium.ai/llms.txt
Use this file to discover all available pages before exploring further.
Your Gradium API key
Format of the audio in the request body. Defaults to audio/wav when omitted.
audio/wav, audio/pcm, audio/ogg, audio/opus Speech-to-Text model name.
Overrides the audio format detected from Content-Type.
wav, pcm, opus JSON-encoded model configuration. Example: {"language": "en"}
WAV audio file.
NDJSON stream of transcription messages.
Newline-delimited JSON messages: text, end_text, or error. The body closes when transcription is complete.