Voice Settings

TTS models accept advanced options via the json_config parameter. In the Python SDK, this is a dict mapping option name to value (float or string). When using the REST endpoints, pass it as a URL-encoded JSON string in the query parameters. These options apply to both the WebSocket and REST transports. For STT, see Transcription Settings.

Quick reference

Parameter	Range	Default	Effect
`temp`	`0.0`–`1.4`	`0.7`	Sampling temperature. `0.0` is deterministic; higher values produce more diverse output.
`cfg_coef`	`1.0`–`4.0`	`2.0`	Voice similarity. Higher values stay closer to the target voice; very high values can introduce artifacts.
`padding_bonus`	`-4.0`–`4.0`	`0.0`	Speech speed. Negative values are faster, positive values are slower.
`rewrite_rules`	string	none	Text-rewriting rules applied before synthesis. See Text Rewriting Rules.
`pronunciation_id`	string	none	A pronunciation dictionary ID, applied per request. See Pronunciations.

For deterministic output, set temp to 0.0. For multi-utterance flows on a single session, see Multiplexing. The TTS engine recognises the <flush> and <break time="..." /> tags described in Text-to-Speech.

Speed control

You can guide the speed of the model using the padding bonus parameter. Default value is 0.0. Negative values mean that the speaker will speak faster (values between -4.0 and -0.1). Positive values mean that the speaker will speak slower (values between 0.1 and 4.0).

sample_text = "Hello, this is a test from the Gradium Text to Speech system. We are testing the speed."

slower_audio = await client.tts(
    setup={'voice_id': 'YTpq7expH9539ERJ', 'output_format': 'wav', 'json_config':{'padding_bonus':2.0}},
    text=sample_text,
)

faster_audio = await client.tts(
    setup={'voice_id': 'YTpq7expH9539ERJ', 'output_format': 'wav', 'json_config':{'padding_bonus':-2.0}},
    text=sample_text,
)

Temperature control

The temperature for the generation can be set with values ranging from 0 to 1.4. A value of 0 corresponds to a deterministic generation, while higher values lead to more diverse outputs. Default value is 0.7.

setup = {'voice_id': 'YTpq7expH9539ERJ', 'output_format': 'wav', 'json_config':{'temp':0.3}}

audio = await client.tts(text=sample_text, setup=setup)

Voice similarity control

The cfg_coef parameter can be used to control the similarity of the generated speech to the target voice. Values range from 1.0 to 4.0. The default value is 2.0. The higher the value, the more the model replicates the cloned voice but larger values can lead to audio artifacts.

Rewrite rules

The rewrite_rules parameter can be used to pass text rewriting rules that are applied before the text is synthesized. The rules should be passed as a string. More details on the rules themselves can be found in the Text Rewriting Rules guide. Values such as "en", "fr", "de", "es", "pt" enable all the rewriting rules for a given language.

Passing `json_config`

config = {"temp": 0.3, "cfg_coef": 2.5, "padding_bonus": -1.0}

# Streaming over WebSocket: setup is keyword args; json_config stays nested.
async with client.tts_realtime(
    voice_id="YTpq7expH9539ERJ",
    output_format="pcm",
    json_config=config,
) as stream:
    ...

# One-shot via SDK: setup is a dict containing json_config.
audio = await client.tts(
    setup={"voice_id": "YTpq7expH9539ERJ", "output_format": "wav", "json_config": config},
    text="Hello",
)

For REST, see the TTS POST reference.

Getting Started

Text-to-Speech

Speech-to-Text

Shared

Voices

Resources

Quick reference

Speed control

Temperature control

Voice similarity control

Rewrite rules

Passing `json_config`

Getting Started

Text-to-Speech

Speech-to-Text

Shared

Voices

Resources

Documentation Index

​Quick reference

​Speed control

​Temperature control

​Voice similarity control

​Rewrite rules

​Passing json_config

Quick reference

Speed control

Temperature control

Voice similarity control

Rewrite rules

Passing `json_config`