setup message: some fields
select the pipeline (models, voice, formats), and the translation
target is passed through json_config. In the Python SDK json_config
is a dict; the SDK serializes it to a JSON string on the wire for you.
These options apply to the WebSocket API
across all three SDK entry points (s2s_realtime, s2s_stream, and the
buffered s2s). For the STT-only and TTS-only knobs, see
Transcription Settings and
Voice Settings.
Setup fields
| Field | Type | Required | Effect |
|---|---|---|---|
model_name | string | Yes | S2S model alias. Defaults to "default". |
stt_model_name | string | No | Speech-to-text model used to transcribe the input. |
tts_model_name | string | No | Text-to-speech model used to synthesize the output. |
voice_id | string | Yes | Voice UID for the synthesized output. Must be a voice in the same language as target_language. See Voices. |
input_format | string | Yes | pcm, wav, opus, ulaw_8000, alaw_8000, or an explicit PCM rate such as pcm_24000. For pcm, input is 24 kHz, 16-bit signed mono. |
output_format | string | Yes | Same value set as input_format. For pcm, output is 48 kHz, 16-bit signed mono. |
json_config | object or string | No | Advanced pipeline settings, see below. |
client_req_id | string | No | Correlates multiplexed requests. See Multiplexing. |
close_ws_on_eos | boolean | No | Defaults to true; set false to keep the socket open after end_of_stream. |
json_config options
| Option | Type | Allowed values | Effect |
|---|---|---|---|
target_language | string | "en", "fr", "de", "es", "pt" | Language to translate the speech into before synthesis. |
target_language before the
output audio is generated, and the text messages you receive back
carry the translated text. The voice_id you choose must be a voice in
this same language.
Passing json_config
The same json_config payload is sent regardless of which SDK entry
point you use; only the call shape differs:
Next steps
S2S WebSocket guide
Real-time, pull-based, and buffered S2S with the Python SDK.
S2S WebSocket reference
Complete wire-level schema: every message type, every field, every
error code.
Voices
Pick the voice used for the synthesized output.
Voice settings
TTS-side options for the synthesis stage.