image

kaizen Speech Studio

Using SSML to Mix Multiple Speakers in One Audio

Speech Synthesis Markup Language (SSML) provides advanced control over text-to-speech conversion, including the ability to mix multiple speakers in one audio file. Here’s how to utilize SSML in Kaizen Speech Studio:

Step 1: Understand SSML Basics
SSML is a powerful tool for customizing voice output. It allows you to specify voice changes, pauses, and other speech-related parameters within your text. Familiarize yourself with SSML tags and their functions to make the most of this feature.

Step 2: Input Your Text with SSML Tags
Navigate to the "Text Input" section of the dashboard. Enter your text, incorporating SSML tags to indicate where different speakers should come in. For example, you can use the <voice> tag to switch between male and female voices or different characters.

Example:
<voice name="en-US-JennyNeural">Hello, I am Jenny.</voice>
<voice name="en-US-GuyNeural">And I am Guy. Together, we'll explain how to use SSML.</voice>

Step 3: Choose Appropriate Voices
Ensure that the voices you specify in the SSML tags are available in the "Voice Selection" tab. You can listen to samples and confirm that the voices match your project's requirements.

Step 4: Customize Each Voice
Use the customization sliders to adjust the speed, pitch, and volume of each voice. These adjustments can be made within the SSML tags or through the platform’s interface, depending on your preference.

Step 5: Preview the Mixed Audio
Use the real-time preview feature to listen to the mixed audio. Ensure that the transitions between speakers are smooth and that the overall audio flows naturally.

Step 6: Generate and Download
Once you’re satisfied with the mixed audio, click the "Generate" button. The platform will process the text and SSML tags to create a seamless audio file with multiple speakers. Download the file in your preferred format.