Text-to-Speech — Kaizen Speech Studio Help

Text-to-Speech interface with editor, voice selector, and prosody controls

Basic flow

Paste text into the main editor (up to 250,000 characters on Free, unlimited on Pro).
Pick a voice from the dropdown or open the voice picker.
Adjust Rate / Pitch / Volume if needed.
(Optional) Pick a Style (cheerful, serious, etc.) — not all voices support all styles.
Click Convert Text to Speech. Audio appears in the player at the top.
Click Save as MP3 or Save as WAV.

Prosody controls

Rate — speed. Slow, x-slow, default, fast, x-fast, or custom percentage.
Pitch — higher / lower. Default, low, x-low, high, x-high, or custom.
Volume — relative loudness. Default, soft, x-soft, loud, x-loud, or custom.

Voice styles

Many voices support style tags for emotional tone:

cheerful — upbeat, friendly
serious — measured, professional
customerservice — supportive, helpful
narration — audiobook-style
Plus dozens more — check each voice's profile in the picker

File import (Pro)

Click Import from TXT / PDF / DOC in the editor footer. Kaizen Speech Studio extracts the text and loads it into the editor. See file imports for details.

SSML mode

Toggle Use SSML format to switch from plain-text mode to SSML. The editor becomes an SSML editor with visual inserts for voice, style, prosody, breaks, and language switches. See SSML editor.

Output

MP3 — best for sharing, good quality/size tradeoff
WAV — lossless, larger files
Copy audio file path to clipboard (for feeding into other tools)
Open in Explorer (jump to the saved file)

Free tier

9 hours / month of TTS on the shared Azure key. 250,000-character input cap. Add your own Azure key (Pro) to lift both.

Kaizen Speech Studio

700+ neural AI voices, transcription and AI video dubbing on Windows — one-time purchase, works offline.

Get Kaizen Speech Studio →Free download