Skip to content

Batch process all PDF files in a folder to make them searchable with OCR using ocrmypdf and a simple PowerShell script. Output files are saved in an 'output' subfolder. Perfect for Windows users needing fast PDF text recovery.

License

Notifications You must be signed in to change notification settings

R0mb0/Batch_PDF_OCR_Processor

Batch PDF OCR Processor for Windows

Codacy Badge

Maintenance Open Source Love svg3 MIT

Donate

Batch process all PDF files in a folder to make them searchable with OCR using ocrmypdf and a simple PowerShell script. Output files are saved in an output subfolder. Perfect for Windows users needing fast PDF text recovery.


Features

  • Processes all PDF files in the current folder
  • Runs OCR to make PDFs searchable (text layer added)
  • Outputs processed PDFs to an output subfolder

Prerequisites

Optional but Recommended

  • pngquant (for better image compression)
  • jbig2 (for advanced PDF compression, but see important Windows note below)

Step-by-Step Installation (Stupid-Proof)

1. Install Chocolatey

Chocolatey lets you install Windows programs from the command line.

  1. Open PowerShell as Administrator (Right click PowerShell > "Run as Administrator").

  2. Paste this command and press Enter:

    Set-ExecutionPolicy Bypass -Scope Process -Force; `
      [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; `
      iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
  3. Close and reopen PowerShell (as normal user is fine for next steps).


2. Install Python and Pip

Using Chocolatey (in PowerShell):

choco install python -y
  • This will install Python and pip.
  • Close and reopen PowerShell after installation.
  • Test with:
    python --version
    pip --version

3. Install Required Packages (ocrmypdf, tesseract, ghostscript)

Install Tesseract and Ghostscript using Chocolatey:

choco install tesseract -y
choco install ghostscript -y

Install ocrmypdf (using pip):

pip install ocrmypdf

4. (Optional) Install Additional Recommended Packages

pngquant

For better image compression, install:

choco install pngquant -y

jbig2 (Advanced, Optional, Not Directly Supported on Windows)

jbig2 is an optional dependency that can improve PDF compression.

  • Important: There is no official Windows binary and it is not available via Chocolatey.
  • If you require jbig2, you will need to manually compile it from source or find a trusted third-party binary for Windows. For most users, this step can be skipped.

5. Enable PowerShell Script Execution

IMPORTANT:
By default, Windows may prevent running scripts.
Before running the script, in PowerShell, execute:

Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass

This change is temporary and only for the current PowerShell window.


Usage

  1. Place ocr_batch.ps1 in the same folder as your PDFs.

  2. Open PowerShell in that folder (Shift + Right Click in the folder > "Open PowerShell window here").

  3. Run the script:

    .\ocr_batch.ps1
  4. Processed PDFs will appear in the output subfolder.

About

Batch process all PDF files in a folder to make them searchable with OCR using ocrmypdf and a simple PowerShell script. Output files are saved in an 'output' subfolder. Perfect for Windows users needing fast PDF text recovery.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project