Supported File Formats¶
Kaizen OCR & PDF supports a wide range of input and output formats for OCR text extraction, document conversion, and image processing.
OCR Input Formats¶
The following file formats can be used as input for text extraction with Easy OCR and Advanced OCR:
| Format | Extension | Description |
|---|---|---|
| PNG | .png |
Portable Network Graphics -- lossless, ideal for screenshots |
| JPEG | .jpg, .jpeg |
Joint Photographic Experts Group -- compressed photographs |
| TIFF | .tif, .tiff |
Tagged Image File Format -- high quality, multi-page support |
| BMP | .bmp |
Bitmap -- uncompressed Windows image format |
| GIF | .gif |
Graphics Interchange Format -- simple graphics |
| WebP | .webp |
Modern image format by Google -- smaller file sizes |
.pdf |
Portable Document Format -- both native text and scanned |
Document Conversion Formats¶
The Convert Documents feature supports conversion between these formats:
Document Formats¶
| Format | Extension | Read | Write |
|---|---|---|---|
.pdf |
|||
| DOCX | .docx |
||
| DOC | .doc |
||
| RTF | .rtf |
||
| TXT | .txt |
||
| HTML | .html |
||
| ODT | .odt |
||
| EPUB | .epub |
Image Formats (for conversion)¶
| Format | Extension | Read | Write |
|---|---|---|---|
| PNG | .png |
||
| JPEG | .jpg |
||
| TIFF | .tif |
||
| BMP | .bmp |
||
| GIF | .gif |
PDF Image Extraction Output¶
When extracting images from PDF files, you can save them in:
| Format | Best For |
|---|---|
| PNG | Lossless quality, transparency, screenshots |
| JPEG | Smaller file size, photographs |
| BMP | Uncompressed bitmap, maximum compatibility |
| TIFF | High-quality archival and professional printing |
OCR Text Output Formats¶
After performing OCR, you can export extracted text as:
- Plain Text -- Copy to clipboard or save as
.txt - Structured Output -- Export with layout preservation for tables and columns
Tips for Best Input Quality¶
Image Resolution
For best OCR accuracy, use images with at least 150 DPI (dots per inch). Images at 300 DPI will give the most accurate results, especially for small text.
PDF Processing
Kaizen OCR can handle both native text PDFs (text-based) and scanned PDFs (image-based). Use the appropriate mode in Advanced OCR for best results.
File Size
There is no strict file size limit. However, very large files (100+ MB) may take longer to process. For batch processing, consider splitting large documents into smaller sections.