Supported languages — Kaizen OCR & PDF Help

Tesseract languages (bundled)

Kaizen OCR ships with Tesseract trained data for 80+ languages, including:

Latin-script: English, German, French, Spanish, Italian, Portuguese, Dutch, Swedish, Norwegian, Danish, Finnish, Polish, Czech, Hungarian, Romanian, Turkish, Vietnamese, Indonesian, Swahili
Cyrillic: Russian, Ukrainian, Bulgarian, Serbian, Macedonian
East Asian: Chinese (Simplified), Chinese (Traditional), Japanese, Korean
South Asian: Hindi, Bengali, Tamil, Telugu, Kannada, Malayalam, Gujarati, Marathi, Nepali, Sinhala, Urdu, Punjabi
Middle Eastern: Arabic, Hebrew, Persian (Farsi)
Other: Greek, Thai, Khmer, Burmese, Georgian, Armenian, Amharic, Tigrinya

Paddle AI languages

Paddle's models are multi-lingual by default — its primary model handles Latin-script, Chinese, and Japanese automatically. For specialty languages, pick the dedicated model in Settings → OCR engine → Paddle language.

Azure Document Intelligence

If you use the Searchable PDF feature with Azure, you get access to 160+ languages. Azure auto-detects language in most cases — no configuration needed. Full list: Microsoft's documentation.

Mixed-language documents

All three engines handle pages with multiple scripts (e.g. English headings with Chinese body text). For best results:

Paddle: use the multi model
Tesseract: specify multiple languages separated by + (e.g. eng+chi_sim) in Settings
Azure: no config needed — it auto-detects per block

RTL languages

Arabic, Hebrew, and Persian text is recognized and output in the correct reading direction. In the text pane, RTL text renders right-to-left; copying to clipboard preserves the correct Unicode order.

Performance notes

Tesseract is fastest on English and other single-script Latin languages.
Paddle is fastest on Chinese, Japanese, and mixed-script content.
Complex scripts (Thai, Khmer, Arabic) are slower than Latin across all engines — plan 2× the time for similar-length text.

Kaizen OCR & PDF

Extract text from any image or PDF, edit, convert and OCR — fast, accurate and fully offline on Windows.

Get Kaizen OCR & PDF →Free download