Multilingual OCR¶
Some documents contain text in more than one language. OCR & PDF Tools can handle multilingual documents by combining multiple language packs during recognition.
How Multilingual OCR Works¶
The Tesseract OCR engine can load multiple language models simultaneously. When processing a multilingual document, it attempts to recognize characters from all selected languages, choosing the best match for each word or character.
Setting Up Multilingual OCR¶
Step 1: Install Required Language Packs¶
Ensure all the languages present in your document are installed. Go to Settings > Language Packs and install any missing languages.
Step 2: Select Multiple Languages¶
- Click the Language dropdown in the toolbar.
- Hold
Ctrland click to select multiple languages. - The selected languages appear in the dropdown display (e.g., "English + French + German").
Step 3: Run OCR¶
Process the image or document as usual. The OCR engine will use all selected language models to recognize text.

Tips for Best Results¶
Language Order Matters
Place the primary (most common) language first in your selection. The OCR engine gives priority to the first selected language, using additional languages for words or characters that do not match the primary language.
Limit the Number of Languages¶
While you can select many languages simultaneously, using too many can:
- Slow down processing
- Increase the chance of misrecognition (the engine may confuse characters between similar scripts)
For best accuracy, select only the languages you know are present in the document.
Same-Script Languages¶
Languages that share the same script (e.g., English, French, and Spanish all use Latin characters) work well together. Combining languages with very different scripts (e.g., English and Chinese) also works but may require more processing time.
Common Multilingual Scenarios¶
| Scenario | Recommended Languages |
|---|---|
| European business documents | English + French + German |
| Academic papers with citations | English + source language |
| Bilingual signs or menus | Primary language + secondary language |
| Mixed English and CJK text | English + Chinese/Japanese/Korean |
Script Mixing
Documents that intermix scripts extensively (e.g., Arabic and Latin characters on the same line) may require additional preprocessing with Advanced OCR for optimal results.
Troubleshooting Multilingual OCR¶
- Wrong characters recognized -- Verify the correct languages are selected. Remove any unnecessary languages.
- Partial recognition -- Check that all relevant language packs are installed.
- Slow processing -- Reduce the number of selected languages to only those present in the document.