Download
Kaizen Speech Studio Kaizen Speech Studio Help All Help Contact
Core feature

Transcribe

Live microphone or file upload → text. Uses Azure Speech-to-Text for near-human accuracy.

Transcribe UI with microphone/file toggle, quality selector, language dropdown

Two input modes

  1. Microphone (live) — speak into the mic, see words appear in near-real-time.
  2. Audio/Video file — upload WAV, MP3, OGG, or FLAC; Speech Studio transcribes it.

Live microphone flow

  1. Pick Microphone.
  2. Select your input device from the system mic dropdown.
  3. (Optional) Set Record to MP3 or WAV to save a copy of the audio alongside the transcript.
  4. Pick a language, or leave on auto-detect.
  5. Click Start. Speak. Click Pause or Stop when done.
  6. Copy the transcript or save as a text file.

File upload flow

  1. Pick Audio/Video File.
  2. Click Browse, select a WAV, MP3, OGG, or FLAC file.
  3. Pick language (or auto).
  4. Click Start. Transcript generates in the right pane.

Quality settings

High (44.1kHz) is the default — rarely worth lowering. Lower settings trade accuracy for speed.

Languages

All 80+ Azure Speech languages. Multi-language audio (e.g. bilingual interview) works best if you specify both candidate languages in the advanced options.

Free tier

Transcription isn't available on Free. Pro gets 5 hours/month via your own Azure key.

Tips for best accuracy

  • Quiet room, decent mic
  • One speaker at a time (diarization is limited)
  • If the source language is set correctly, accuracy jumps 10-20%
  • For heavy accents, pick the specific regional variant (e.g. en-IN instead of en-US)