Skip to content

Supported Languages

Kaizen OCR & PDF supports 100+ languages for text extraction using both Paddle OCR and Tesseract OCR engines. Whether you need to process documents in English, Chinese, Arabic, Hindi, or any other major language, Kaizen OCR has you covered.

Language Paddle OCR Tesseract OCR
English
Spanish (Español)
French (Français)
German (Deutsch)
Italian (Italiano)
Portuguese (Português)
Chinese (Simplified)
Chinese (Traditional)
Japanese
Korean
Arabic
Hindi
Russian
Dutch (Nederlands)
Polish (Polski)
Turkish (Türkçe)
Vietnamese (Tiếng Việt)
Thai
Greek
Hebrew

Full Language List

Kaizen OCR supports the following language families and scripts:

European Languages

Afrikaans, Albanian, Basque, Bosnian, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hungarian, Icelandic, Irish, Italian, Latvian, Lithuanian, Luxembourgish, Macedonian, Maltese, Norwegian, Polish, Portuguese, Romanian, Serbian, Slovak, Slovenian, Spanish, Swedish, Welsh

Asian Languages

Bengali, Chinese (Simplified), Chinese (Traditional), Gujarati, Hindi, Japanese, Kannada, Korean, Malayalam, Marathi, Nepali, Punjabi, Sinhala, Tamil, Telugu, Thai, Urdu, Vietnamese

Middle Eastern & African Languages

Arabic, Farsi (Persian), Hebrew, Kurdish, Swahili, Turkish

Central Asian & Other Languages

Armenian, Azerbaijani, Georgian, Kazakh, Kyrgyz, Mongolian, Tajik, Uzbek

Multi-Language OCR

Kaizen OCR supports simultaneous multi-language processing, meaning you can extract text from documents that contain multiple languages in a single scan. This is especially useful for:

  • Multilingual business documents
  • Academic papers with references in multiple languages
  • International contracts and legal documents
  • Travel documents and passports

For Best Results

When processing documents with multiple languages, Paddle OCR generally provides better accuracy for mixed-language content. Use Advanced OCR to select specific languages for optimal results.

Right-to-Left (RTL) Language Support

Kaizen OCR fully supports right-to-left languages including:

  • Arabic
  • Hebrew
  • Farsi (Persian)
  • Urdu

The OCR engine correctly handles RTL text direction and produces properly formatted output.