Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.gradium.ai/llms.txt

Use this file to discover all available pages before exploring further.

TTS models accept advanced options via the json_config parameter. In the Python SDK, this is a dict mapping option name to value (float or string). When using the REST endpoints, pass it as a URL-encoded JSON string in the query parameters. These options apply to both the WebSocket and REST transports. For STT, see Transcription Settings.

Quick reference

ParameterRangeDefaultEffect
temp0.01.40.7Sampling temperature. 0.0 is deterministic; higher values produce more diverse output.
cfg_coef1.04.02.0Voice similarity. Higher values stay closer to the target voice; very high values can introduce artifacts.
padding_bonus-4.04.00.0Speech speed. Negative values are faster, positive values are slower.
rewrite_rulesstringnoneText-rewriting rules applied before synthesis. See Text Rewriting Rules.
pronunciation_idstringnoneA pronunciation dictionary ID, applied per request. See Pronunciations.
For deterministic output, set temp to 0.0. For multi-utterance flows on a single session, see Multiplexing. The TTS engine recognises the <flush> and <break time="..." /> tags described in Text-to-Speech.

Speed control

You can guide the speed of the model using the padding bonus parameter. Default value is 0.0. Negative values mean that the speaker will speak faster (values between -4.0 and -0.1). Positive values mean that the speaker will speak slower (values between 0.1 and 4.0).
sample_text = "Hello, this is a test from the Gradium Text to Speech system. We are testing the speed."

slower_audio = await client.tts(
    setup={'voice_id': 'YTpq7expH9539ERJ', 'output_format': 'wav', 'json_config':{'padding_bonus':2.0}},
    text=sample_text,
)

faster_audio = await client.tts(
    setup={'voice_id': 'YTpq7expH9539ERJ', 'output_format': 'wav', 'json_config':{'padding_bonus':-2.0}},
    text=sample_text,
)

Temperature control

The temperature for the generation can be set with values ranging from 0 to 1.4. A value of 0 corresponds to a deterministic generation, while higher values lead to more diverse outputs. Default value is 0.7.
setup = {'voice_id': 'YTpq7expH9539ERJ', 'output_format': 'wav', 'json_config':{'temp':0.3}}

audio = await client.tts(text=sample_text, setup=setup)

Voice similarity control

The cfg_coef parameter can be used to control the similarity of the generated speech to the target voice. Values range from 1.0 to 4.0. The default value is 2.0. The higher the value, the more the model replicates the cloned voice but larger values can lead to audio artifacts.

Rewrite rules

The rewrite_rules parameter can be used to pass text rewriting rules that are applied before the text is synthesized. The rules should be passed as a string. More details on the rules themselves can be found in the Text Rewriting Rules guide. Values such as "en", "fr", "de", "es", "pt" enable all the rewriting rules for a given language.

Passing json_config

config = {"temp": 0.3, "cfg_coef": 2.5, "padding_bonus": -1.0}

# Streaming over WebSocket: setup is keyword args; json_config stays nested.
async with client.tts_realtime(
    voice_id="YTpq7expH9539ERJ",
    output_format="pcm",
    json_config=config,
) as stream:
    ...

# One-shot via SDK: setup is a dict containing json_config.
audio = await client.tts(
    setup={"voice_id": "YTpq7expH9539ERJ", "output_format": "wav", "json_config": config},
    text="Hello",
)
For REST, see the TTS POST reference.