Supported Languages¶
Kaizen OCR & PDF supports 100+ languages for text extraction using both Paddle OCR and Tesseract OCR engines. Whether you need to process documents in English, Chinese, Arabic, Hindi, or any other major language, Kaizen OCR has you covered.
Popular Languages¶
| Language | Paddle OCR | Tesseract OCR |
|---|---|---|
| English | ||
| Spanish (Español) | ||
| French (Français) | ||
| German (Deutsch) | ||
| Italian (Italiano) | ||
| Portuguese (Português) | ||
| Chinese (Simplified) | ||
| Chinese (Traditional) | ||
| Japanese | ||
| Korean | ||
| Arabic | ||
| Hindi | ||
| Russian | ||
| Dutch (Nederlands) | ||
| Polish (Polski) | ||
| Turkish (Türkçe) | ||
| Vietnamese (Tiếng Việt) | ||
| Thai | ||
| Greek | ||
| Hebrew |
Full Language List¶
Kaizen OCR supports the following language families and scripts:
European Languages¶
Afrikaans, Albanian, Basque, Bosnian, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hungarian, Icelandic, Irish, Italian, Latvian, Lithuanian, Luxembourgish, Macedonian, Maltese, Norwegian, Polish, Portuguese, Romanian, Serbian, Slovak, Slovenian, Spanish, Swedish, Welsh
Asian Languages¶
Bengali, Chinese (Simplified), Chinese (Traditional), Gujarati, Hindi, Japanese, Kannada, Korean, Malayalam, Marathi, Nepali, Punjabi, Sinhala, Tamil, Telugu, Thai, Urdu, Vietnamese
Middle Eastern & African Languages¶
Arabic, Farsi (Persian), Hebrew, Kurdish, Swahili, Turkish
Central Asian & Other Languages¶
Armenian, Azerbaijani, Georgian, Kazakh, Kyrgyz, Mongolian, Tajik, Uzbek
Multi-Language OCR¶
Kaizen OCR supports simultaneous multi-language processing, meaning you can extract text from documents that contain multiple languages in a single scan. This is especially useful for:
- Multilingual business documents
- Academic papers with references in multiple languages
- International contracts and legal documents
- Travel documents and passports
For Best Results
When processing documents with multiple languages, Paddle OCR generally provides better accuracy for mixed-language content. Use Advanced OCR to select specific languages for optimal results.
Right-to-Left (RTL) Language Support¶
Kaizen OCR fully supports right-to-left languages including:
- Arabic
- Hebrew
- Farsi (Persian)
- Urdu
The OCR engine correctly handles RTL text direction and produces properly formatted output.