Supported Formats¶
OCR & PDF Tools handles a wide range of file formats for both input (OCR source files) and output (converted or exported files).
Image Formats (OCR Input)¶
These image formats can be loaded for text extraction:
| Format | Extension | Notes |
|---|---|---|
| PNG | .png |
Recommended for screenshots and digital images |
| JPEG | .jpg, .jpeg |
Common photo format, good for scanned documents |
| TIFF | .tif, .tiff |
Professional scanning format, supports multi-page |
| BMP | .bmp |
Uncompressed Windows bitmap |
| GIF | .gif |
Basic support, not ideal for OCR |
| WebP | .webp |
Modern web image format |
Best Format for OCR
For the highest OCR accuracy, use PNG or TIFF images at 300 DPI or higher. These formats preserve detail without compression artifacts that can reduce recognition quality.
PDF Support¶
| Operation | Details |
|---|---|
| Text extraction | Reads embedded text from digitally-created PDFs |
| OCR on PDF pages | Converts scanned PDF pages to images for OCR processing |
| Merge | Combine multiple PDFs into one |
| Split | Divide a PDF into multiple files |
| Password | Add or remove password protection |
| Convert from | PDF to DOCX, RTF, HTML, EPUB, PNG, JPG, TIFF |
| Convert to | Images and other documents to PDF |
Document Formats¶
| Format | Extension | Read | Write |
|---|---|---|---|
.pdf |
Yes | Yes | |
| Word (DOCX) | .docx |
Yes | Yes |
| Rich Text | .rtf |
Yes | Yes |
| HTML | .html |
Yes | Yes |
| EPUB | .epub |
Yes | Yes |
| Plain Text | .txt |
Yes | Yes |
Export Formats for OCR Results¶
After extracting text with OCR, you can save the results in:
- Plain text (
.txt) -- Raw text with no formatting - Word document (
.docx) -- Text with basic formatting preserved - Rich text (
.rtf) -- Compatible with most word processors - HTML (
.html) -- For web publishing
Multi-Page TIFF¶
OCR & PDF Tools supports multi-page TIFF files, which are common in professional scanning environments. Each page in the TIFF is treated as a separate image for OCR processing.
File Size Limits¶
There are no hard file size limits, but very large files (hundreds of MB) may require more processing time and memory. For optimal performance:
- Keep individual images under 50 MB
- Split very large PDFs before processing
- Use 300 DPI for scanning (higher resolutions increase file size without improving OCR accuracy)