Skip to main content
A Speech-to-Speech request translates spoken audio into another language. It is configured through the setup message: some fields select the pipeline (models, voice, formats), and the translation target is passed through json_config. In the Python SDK json_config is a dict; the SDK serializes it to a JSON string on the wire for you. These options apply to the WebSocket API across all three SDK entry points (s2s_realtime, s2s_stream, and the buffered s2s). For the STT-only and TTS-only knobs, see Transcription Settings and Voice Settings.

Setup fields

FieldTypeRequiredEffect
model_namestringYesS2S model alias. Defaults to "default".
stt_model_namestringNoSpeech-to-text model used to transcribe the input.
tts_model_namestringNoText-to-speech model used to synthesize the output.
voice_idstringYesVoice UID for the synthesized output. Must be a voice in the same language as target_language. See Voices.
input_formatstringYespcm, wav, opus, ulaw_8000, alaw_8000, or an explicit PCM rate such as pcm_24000. For pcm, input is 24 kHz, 16-bit signed mono.
output_formatstringYesSame value set as input_format. For pcm, output is 48 kHz, 16-bit signed mono.
json_configobject or stringNoAdvanced pipeline settings, see below.
client_req_idstringNoCorrelates multiplexed requests. See Multiplexing.
close_ws_on_eosbooleanNoDefaults to true; set false to keep the socket open after end_of_stream.

json_config options

OptionTypeAllowed valuesEffect
target_languagestring"en", "fr", "de", "es", "pt"Language to translate the speech into before synthesis.
The transcribed text is translated into target_language before the output audio is generated, and the text messages you receive back carry the translated text. The voice_id you choose must be a voice in this same language.

Passing json_config

The same json_config payload is sent regardless of which SDK entry point you use; only the call shape differs:
config = {"target_language": "en"}

# Real-time: setup is keyword arguments; json_config stays nested.
async with client.s2s_realtime(
    model_name="default",
    input_format="pcm",
    output_format="pcm",
    voice_id="YTpq7expH9539ERJ",
    json_config=config,
) as s2s:
    ...

# Pull-based / buffered: setup is a dict containing json_config.
setup = {
    "model_name": "default",
    "input_format": "pcm",
    "output_format": "wav",
    "voice_id": "YTpq7expH9539ERJ",
    "json_config": config,
}
stream = await client.s2s_stream(setup, audio_generator(audio_data))

Next steps

S2S WebSocket guide

Real-time, pull-based, and buffered S2S with the Python SDK.

S2S WebSocket reference

Complete wire-level schema: every message type, every field, every error code.

Voices

Pick the voice used for the synthesized output.

Voice settings

TTS-side options for the synthesis stage.