json_config
parameter. In the Python api, this parameter is passed as a dictionary mapping
string to values (either float or string).
This parameter can be used to control:
- Speed of the generated speech via the
padding_bonusparameter. - Stability of the generated speech via the
temptemperature parameter. - Voice similarity using the
cfg_coefparameter. - Rewrite rules using
rewrite_rules.
Speed Control
You can guide the speed of the model using the padding bonus parameter. Default value is 0.0. Negative values mean that the speaker will speak faster (values between -4.0 and -0.1). Positive values mean that the speaker will speak slower (values between 0.1 and 4.0).Temperature Control
The temperature for the generation can be set with values ranging from 0 to 1.4. A value of 0 corresponds to a deterministic generation, while higher values lead to more diverse outputs. Default value is 0.7.Voice Similarity Control
Thecfg_coef parameter can be used to control the
similarity of the generated speech to the target voice. Values range from 1.0 to
4.0. The default value is 2.0. The higher the value, the more the model
replicates the cloned voice but larger values can lead to audio artifacts.
Rewrite Rules
Therewrite_rules parameter can be used to pass text
rewriting rules that are applied before the text is synthesized. The rules
should be passed as a string. More details on the rules themselves can be found
in the Text Rewriting Rules guide. Values such as "en", "fr", "de", "es", "pt"
enable all the rewriting rules for a given language.
WebSocket Stream Options
When using the real-time WebSocket API (tts_realtime or stt_realtime), you can control connection initialization behavior with two parameters:
send_setup_on_start
Controls whether the setup message is automatically sent when the context manager is entered. Defaults to True.
Set this to False when you need to send setup manually, for example when using multiplexing where each request has its own setup with a unique client_req_id.
wait_for_ready_on_start
Controls whether the client blocks waiting for the server’s ready message after sending setup. Defaults to False.
When set to False, the ready message is captured lazily during the normal receive loop. This reduces connection latency since you can start sending data immediately after setup without waiting for a round-trip.
When set to True, stream.ready is guaranteed to be populated before you start sending data, which can be useful if you need server-provided metadata (like sample rate) before proceeding.
stt_realtime: