SSML Support¶
Speech Synthesis Markup Language (SSML) gives you fine-grained control over how Speech Studio generates audio. With SSML, you can adjust pronunciation, add pauses, change speaking styles, and control emphasis at a granular level.
What Is SSML?¶
SSML is an XML-based markup language used to control text-to-speech output. Instead of relying solely on the engine's default interpretation of your text, SSML lets you explicitly specify how each word or phrase should be spoken.
Enabling SSML Mode¶
- In the text-to-speech interface, toggle the SSML switch to enable SSML mode.
- The text input area now accepts SSML markup instead of plain text.
- Write or paste your SSML-formatted content.
Common SSML Tags¶
Pauses¶
Insert a pause at any point in the speech:
<break time="500ms"/>
Speaking Rate and Pitch¶
Adjust speed and pitch for a section of text:
<prosody rate="slow" pitch="+10%">
This will be spoken slowly at a higher pitch.
</prosody>
Emphasis¶
Add emphasis to specific words:
<emphasis level="strong">important</emphasis>
Voice Style¶
Switch the speaking style mid-speech (supported on select voices):
<mstts:express-as style="cheerful">
Great news! Your order has shipped.
</mstts:express-as>
Pronunciation¶
Specify how a word should be pronounced using phonetic spelling:
<phoneme alphabet="ipa" ph="təˈmeɪtoʊ">tomato</phoneme>
Example SSML Document¶
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
xmlns:mstts="https://www.w3.org/2001/mstts" xml:lang="en-US">
<voice name="en-US-JennyNeural">
<mstts:express-as style="friendly">
Welcome to our tutorial!
</mstts:express-as>
<break time="300ms"/>
<prosody rate="-10%">
Let me walk you through the steps carefully.
</prosody>
</voice>
</speak>
SSML Editor
Speech Studio validates your SSML before conversion, highlighting any errors so you can correct them before generating audio.
Voice Compatibility
Not all SSML features are supported by every voice. Style tags like express-as only work with voices that have multiple style options. Check the Voice Styles & Emotions guide for compatible voices.