Advanced Options

Some models support advanced options that can be passed using the json_config parameter. In the Python api, this parameter is passed as a dictionary mapping string to values (either float or string). This parameter can be used to control:

Speed of the generated speech via the padding_bonus parameter.
Stability of the generated speech via the temp temperature parameter.
Voice similarity using the cfg_coef parameter.
Rewrite rules using rewrite_rules.

Speed Control

You can guide the speed of the model using the padding bonus parameter. Default value is 0.0. Negative values mean that the speaker will speak faster (values between -4.0 and -0.1). Positive values mean that the speaker will speak slower (values between 0.1 and 4.0).

sample_text = "Hello, this is a test from the Gradium Text to Speech system. We are testing the speed."

slower_audio = await client.tts(
    setup={'voice_id': 'YTpq7expH9539ERJ', 'output_format': 'wav', 'json_config':{'padding_bonus':2.0}},
    text=sample_text,
)

faster_audio = await client.tts(
    setup={'voice_id': 'YTpq7expH9539ERJ', 'output_format': 'wav', 'json_config':{'padding_bonus':-2.0}},
    text=sample_text,
)

Temperature Control

The temperature for the generation can be set with values ranging from 0 to 1.4. A value of 0 corresponds to a deterministic generation, while higher values lead to more diverse outputs. Default value is 0.7.

setup = {'voice_id': 'YTpq7expH9539ERJ', 'output_format': 'wav', 'json_config':{'temp':0.3}}

audio = await client.tts(text=sample_text, setup=setup)

Voice Similarity Control

The cfg_coef parameter can be used to control the similarity of the generated speech to the target voice. Values range from 1.0 to 4.0. The default value is 2.0. The higher the value, the more the model replicates the cloned voice but larger values can lead to audio artifacts.

Rewrite Rules

The rewrite_rules parameter can be used to pass text rewriting rules that are applied before the text is synthesized. The rules should be passed as a string. More details on the rules themselves can be found in the Text Rewriting Rules guide. Values such as "en", "fr", "de", "es", "pt" enable all the rewriting rules for a given language.

WebSocket Stream Options

When using the real-time WebSocket API (tts_realtime or stt_realtime), you can control connection initialization behavior with two parameters:

`send_setup_on_start`

Controls whether the setup message is automatically sent when the context manager is entered. Defaults to True. Set this to False when you need to send setup manually, for example when using multiplexing where each request has its own setup with a unique client_req_id.

# Setup sent automatically (default)
async with client.tts_realtime(voice_id="YTpq7expH9539ERJ", output_format="pcm") as stream:
    await stream.send_text("Hello")

# Setup sent manually
async with client.tts_realtime(send_setup_on_start=False) as stream:
    await stream.send_setup({"voice_id": "YTpq7expH9539ERJ", "output_format": "pcm"})
    await stream.send_text("Hello")

`wait_for_ready_on_start`

Controls whether the client blocks waiting for the server’s ready message after sending setup. Defaults to False. When set to False, the ready message is captured lazily during the normal receive loop. This reduces connection latency since you can start sending data immediately after setup without waiting for a round-trip. When set to True, stream.ready is guaranteed to be populated before you start sending data, which can be useful if you need server-provided metadata (like sample rate) before proceeding.

# Non-blocking (default) - ready captured lazily during recv
async with client.tts_realtime(
    voice_id="YTpq7expH9539ERJ",
    output_format="pcm"
) as stream:
    # stream.ready is None here - start sending immediately
    await stream.send_text("Hello")
    await stream.send_eos()

    async for msg in stream:
        # stream.ready gets populated when the ready message arrives
        if msg["type"] == "audio":
            process_audio(msg["audio"])

# Blocking - wait for ready before sending
async with client.tts_realtime(
    voice_id="YTpq7expH9539ERJ",
    output_format="pcm",
    wait_for_ready_on_start=True
) as stream:
    # stream.ready is populated here
    print(f"Server ready: {stream.ready}")
    await stream.send_text("Hello")

Both parameters work identically for stt_realtime:

async with client.stt_realtime(
    model_name="default",
    input_format="pcm",
    wait_for_ready_on_start=True
) as stream:
    print(f"Server ready: {stream.ready}")
    await stream.send_audio(audio_data)

Getting Started

Text-to-Speech

Speech-to-Text

Voices

Resources

Speed Control

Temperature Control

Voice Similarity Control

Rewrite Rules

WebSocket Stream Options

`send_setup_on_start`

`wait_for_ready_on_start`

Getting Started

Text-to-Speech

Speech-to-Text

Voices

Resources

​Speed Control

​Temperature Control

​Voice Similarity Control

​Rewrite Rules

​WebSocket Stream Options

​send_setup_on_start

​wait_for_ready_on_start

Speed Control

Temperature Control

Voice Similarity Control

Rewrite Rules

WebSocket Stream Options

`send_setup_on_start`

`wait_for_ready_on_start`