Batch process all PDF files in a folder to make them searchable with OCR using ocrmypdf and a simple PowerShell script. Output files are saved in an output
subfolder. Perfect for Windows users needing fast PDF text recovery.
- Processes all PDF files in the current folder
- Runs OCR to make PDFs searchable (text layer added)
- Outputs processed PDFs to an
output
subfolder
- Windows 10/11
- PowerShell (already included in Windows)
- Chocolatey package manager (for easy installation)
- Python 3 (with pip)
- Tesseract-OCR
- Ghostscript
- ocrmypdf
- pngquant (for better image compression)
- jbig2 (for advanced PDF compression, but see important Windows note below)
Chocolatey lets you install Windows programs from the command line.
-
Open PowerShell as Administrator (Right click PowerShell > "Run as Administrator").
-
Paste this command and press Enter:
Set-ExecutionPolicy Bypass -Scope Process -Force; ` [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; ` iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))
-
Close and reopen PowerShell (as normal user is fine for next steps).
Using Chocolatey (in PowerShell):
choco install python -y
- This will install Python and pip.
- Close and reopen PowerShell after installation.
- Test with:
python --version pip --version
Install Tesseract and Ghostscript using Chocolatey:
choco install tesseract -y
choco install ghostscript -y
Install ocrmypdf (using pip):
pip install ocrmypdf
For better image compression, install:
choco install pngquant -y
jbig2 is an optional dependency that can improve PDF compression.
- Important: There is no official Windows binary and it is not available via Chocolatey.
- If you require jbig2, you will need to manually compile it from source or find a trusted third-party binary for Windows. For most users, this step can be skipped.
IMPORTANT:
By default, Windows may prevent running scripts.
Before running the script, in PowerShell, execute:
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
This change is temporary and only for the current PowerShell window.
-
Place
ocr_batch.ps1
in the same folder as your PDFs. -
Open PowerShell in that folder (Shift + Right Click in the folder > "Open PowerShell window here").
-
Run the script:
.\ocr_batch.ps1
-
Processed PDFs will appear in the
output
subfolder.