Multilingual OCR¶

Some documents contain text in more than one language. OCR & PDF Tools can handle multilingual documents by combining multiple language packs during recognition.

How Multilingual OCR Works¶

The Tesseract OCR engine can load multiple language models simultaneously. When processing a multilingual document, it attempts to recognize characters from all selected languages, choosing the best match for each word or character.

Setting Up Multilingual OCR¶

Step 1: Install Required Language Packs¶

Ensure all the languages present in your document are installed. Go to Settings > Language Packs and install any missing languages.

Step 2: Select Multiple Languages¶

Click the Language dropdown in the toolbar.
Hold Ctrl and click to select multiple languages.
The selected languages appear in the dropdown display (e.g., "English + French + German").

Step 3: Run OCR¶

Process the image or document as usual. The OCR engine will use all selected language models to recognize text.

Tips for Best Results¶

Language Order Matters

Place the primary (most common) language first in your selection. The OCR engine gives priority to the first selected language, using additional languages for words or characters that do not match the primary language.

Limit the Number of Languages¶

While you can select many languages simultaneously, using too many can:

Slow down processing
Increase the chance of misrecognition (the engine may confuse characters between similar scripts)

For best accuracy, select only the languages you know are present in the document.

Same-Script Languages¶

Languages that share the same script (e.g., English, French, and Spanish all use Latin characters) work well together. Combining languages with very different scripts (e.g., English and Chinese) also works but may require more processing time.

Common Multilingual Scenarios¶

Scenario	Recommended Languages
European business documents	English + French + German
Academic papers with citations	English + source language
Bilingual signs or menus	Primary language + secondary language
Mixed English and CJK text	English + Chinese/Japanese/Korean

Script Mixing

Documents that intermix scripts extensively (e.g., Arabic and Latin characters on the same line) may require additional preprocessing with Advanced OCR for optimal results.

Troubleshooting Multilingual OCR¶

Wrong characters recognized -- Verify the correct languages are selected. Remove any unnecessary languages.
Partial recognition -- Check that all relevant language packs are installed.
Slow processing -- Reduce the number of selected languages to only those present in the document.

Get OCR & PDF Tools