Advanced OCR¶
Advanced OCR provides powerful image preprocessing tools that significantly improve text extraction accuracy for difficult images. Use it when Simple OCR does not produce satisfactory results.
When to Use Advanced OCR¶
Advanced OCR is designed for challenging scenarios:
- Photographs taken at an angle or in poor lighting
- Scanned documents with skew, noise, or low contrast
- Images with complex layouts (columns, tables, mixed text and graphics)
- Historical or degraded documents with faded text
- Images with textured or colored backgrounds
Preprocessing Options¶
Before running OCR, you can apply one or more preprocessing filters to improve the image:
Deskew¶
Automatically straightens a tilted image so text lines are horizontal. This is critical for scanned documents that were fed at a slight angle.
Binarization¶
Converts the image to black and white, which dramatically improves OCR accuracy on images with varying contrast or colored backgrounds.
Noise Removal¶
Removes specks, dots, and other artifacts that can confuse the OCR engine. Useful for scanned photocopies or faxes.
Contrast Enhancement¶
Increases the difference between text and background, making faded or light text more readable.
Scaling¶
Enlarges small text to a size the OCR engine can process more accurately. The engine works best when text characters are at least 20 pixels tall.
How to Use Advanced OCR¶
- Load your image -- Open or drag and drop the image file.
- Open Advanced OCR -- Click the Advanced OCR option in the sidebar.
- Apply preprocessing -- Select one or more preprocessing options. A preview updates in real time so you can see the effect.
- Select OCR language -- Choose the appropriate language from the dropdown.
- Run OCR -- Click Extract Text to process the preprocessed image.
- Review results -- Compare the output with the original image and make edits as needed.

Experiment with Filters
Different images respond differently to preprocessing. Try various combinations to find what works best for your specific document type. Deskew + Binarization is a good starting combination for most scanned documents.