Skip to content

PDF Text Extraction

Extract text directly from PDF documents without needing OCR. This feature reads the embedded text layer in a PDF, which is faster and more accurate than OCR for digitally created PDFs.

How It Works

PDFs can contain text in two ways:

  1. Embedded text -- Digitally created PDFs (from Word, web pages, etc.) contain actual text data that can be extracted directly
  2. Scanned images -- Scanned PDFs contain only images of pages and require OCR to extract text

PDF Text Extraction handles the first type. For scanned PDFs, use Simple OCR or Advanced OCR.

Extracting Text from a PDF

  1. Click PDF Extract in the sidebar.
  2. Click Open and select your PDF file, or drag and drop it onto the window.
  3. The PDF loads and displays a page preview.
  4. Choose extraction options:
    • All pages -- Extract text from the entire document
    • Page range -- Specify a range (e.g., pages 1-5, 10, 15-20)
    • Current page -- Extract only from the displayed page
  5. Click Extract.
  6. The extracted text appears in the output panel.

PDF text extraction

Working with the Extracted Text

After extraction, you can:

  • Copy to clipboard -- Click Copy or press Ctrl+C
  • Save as text file -- Export to .txt format
  • Save as other formats -- Export to DOCX, RTF, or HTML using the File Conversion feature
  • Edit -- Make corrections directly in the output panel

Handling Mixed PDFs

Some PDFs contain both embedded text and scanned pages. The extraction tool will pull text from the embedded pages. For scanned pages, you will need to run OCR separately.

Check the Output

If the extracted text is garbled or missing, the PDF likely contains scanned images rather than embedded text. Switch to Simple OCR or Advanced OCR for those pages.

Large Documents

For PDFs with hundreds of pages, extraction may take a few moments. A progress bar shows the current status. You can cancel the operation at any time without losing already-extracted text.

Password-Protected PDFs

If the PDF requires a password, you will be prompted to enter it before extraction. If you do not know the password, see Remove PDF Password.


:octicons-arrow-right-24: Get OCR & PDF Tools