Speech Synthesis Markup Language (SSML) provides advanced
control over text-to-speech conversion, including the
ability to mix multiple speakers in one audio file. Here’s
how to utilize SSML in Kaizen Speech Studio:
Step 1: Understand SSML Basics
SSML is a powerful tool for customizing voice output. It
allows you to specify voice changes, pauses, and other
speech-related parameters within your text. Familiarize
yourself with SSML tags and their functions to make the most
of this feature.
Step 2: Input Your Text with SSML Tags
Navigate to the "Text Input" section of the dashboard.
Enter your text, incorporating SSML tags to indicate where
different speakers should come in. For example, you can use
the <voice> tag to switch between male and female
voices or different characters.
Example:
<voice name="en-US-JennyNeural">Hello, I am
Jenny.</voice>
<voice name="en-US-GuyNeural">And I am Guy. Together,
we'll explain how to use SSML.</voice>
Step 3: Choose Appropriate Voices
Ensure that the voices you specify in the SSML tags are
available in the "Voice Selection" tab. You can listen to
samples and confirm that the voices match your project's
requirements.
Step 4: Customize Each Voice
Use the customization sliders to adjust the speed, pitch,
and volume of each voice. These adjustments can be made
within the SSML tags or through the platform’s interface,
depending on your preference.
Step 5: Preview the Mixed Audio
Use the real-time preview feature to listen to the mixed
audio. Ensure that the transitions between speakers are
smooth and that the overall audio flows naturally.
Step 6: Generate and Download
Once you’re satisfied with the mixed audio, click the
"Generate" button. The platform will process the text and
SSML tags to create a seamless audio file with multiple
speakers. Download the file in your preferred format.