Omni OCR Benchmark

A benchmarking tool that compares OCR and data extraction capabilities of different large multimodal models such as gpt-4o, evaluating both text and json extraction accuracy. The goal of this benchmark is to publish a comprehensive benchmark of OCRaccuracy across traditional OCR providers and multimodal Language Models. The evaluation dataset and methodologies are all Open Source, and we encourage expanding this benchmark to encompass any additional providers.

Open Source LLM Benchmark Results (Mar 2025) | Dataset

Benchmark Results (Feb 2025) | Dataset

Methodology

The primary goal is to evaluate JSON extraction from documents. To evaluate this, the Omni benchmark runs Document ⇒ OCR ⇒ Extraction. Measuring how well a model can OCR a page, and return that content in a format that an LLM can parse.

Evaluation Metrics

JSON accuracy

We use a modified json-diff to identify differences between predicted and ground truth JSON objects. You can review the evaluation/json.ts file to see the exact implementation. Accuracy is calculated as:

$$\text{Accuracy} = 1 - \frac{\text{number of difference fields}}{\text{total fields}}$$

Text similarity

While the primary benchmark metric is JSON accuracy, we have included levenshtein distance as a measurement of text similarity between extracted and ground truth text. Lower distance indicates higher similarity. Note this scoring method heavily penalizes accurate text that does not conform to the exact layout of the ground truth data.

In the example below, an LLM could decode both blocks of text without any issue. All the information is 100% accurate, but slight rearrangements of the header text (address, phone number, etc.) result in a large difference on edit distance scoring.

Running the benchmark

Clone the repo and install dependencies: npm install
Prepare your test data
1. For local data, add individual files to the data folder.
2. To pull from a DB, add DATABASE_URL in your .env
Copy the models.example.yaml file to models.yaml. Set up API keys in .env for the models you want to test. Check out the supported models here.
Run the benchmark: npm run benchmark
Results will be saved in the results/<timestamp>/results.json file.

Supported models

To enable specific models, create a models.yaml file in the src directory. Check out the models.example.yaml file for the required variables.

models:
  - ocr: gemini-2.0-flash-001 # The model to use for OCR
    extraction: gpt-4o # The model to use for JSON extraction

  - ocr: gpt-4o
    extraction: gpt-4o
    directImageExtraction: true # Whether to use the model's native image extraction capabilities

You can view configuration for each model in the src/models/ folder.

Closed-source LLMs

Model Provider	Models	OCR	JSON Extraction	Required ENV Variables
Anthropic	`claude-3-5-sonnet-20241022`	✅	✅	`ANTHROPIC_API_KEY`
OpenAI	`gpt-4o`	✅	✅	`OPENAI_API_KEY`
Gemini	`gemini-2.0-flash-001`, `gemini-1.5-pro`, `gemini-1.5-flash`	✅	✅	`GOOGLE_GENERATIVE_AI_API_KEY`
Mistral	`mistral-ocr`	✅	❌	`MISTRAL_API_KEY`
OmniAI	`omniai`	✅	✅	`OMNIAI_API_KEY`, `OMNIAI_API_URL`

Open-source LLMs

Model Provider	Models	OCR	JSON Extraction	Required ENV Variables
Gemma 3	`google/gemma-3-27b-it`	✅	❌
Qwen 2.5	`qwen2.5-vl-32b-instruct`, `qwen2.5-vl-72b-instruct`	✅	❌
Llama 3.2	`meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo`, `meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo`	✅	❌
ZeroX	`zerox`	✅	✅	`OPENAI_API_KEY`

Cloud OCR Providers

Model Provider	Models	OCR	JSON Extraction	Required ENV Variables
AWS	`aws-text-extract`	✅	❌	`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`
Azure	`azure-document-intelligence`	✅	❌	`AZURE_DOCUMENT_INTELLIGENCE_ENDPOINT`, `AZURE_DOCUMENT_INTELLIGENCE_KEY`
Google	`google-document-ai`	✅	❌	`GOOGLE_LOCATION`, `GOOGLE_PROJECT_ID`, `GOOGLE_PROCESSOR_ID`, `GOOGLE_APPLICATION_CREDENTIALS_PATH`
Unstructured	`unstructured`	✅	❌	`UNSTRUCTURED_API_KEY`

LLMS are instructed to use the following system prompts for OCR and JSON extraction.
For Google Document AI, you need to include google_credentials.json in the data folder.

Benchmark Dashboard

You can use benchmark dashboard to easily view the results of each test run. Check out the dashboard documentation for more details.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.streamlit		.streamlit
assets		assets
dashboard		dashboard
data		data
prisma		prisma
src		src
tests/evaluation		tests/evaluation
.env.example		.env.example
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
jest.config.ts		jest.config.ts
package-lock.json		package-lock.json
package.json		package.json
requirements.txt		requirements.txt
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Omni OCR Benchmark

Methodology

Evaluation Metrics

JSON accuracy

Text similarity

Running the benchmark

Supported models

Closed-source LLMs

Open-source LLMs

Cloud OCR Providers

Benchmark Dashboard

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

getomni-ai/benchmark

Folders and files

Latest commit

History

Repository files navigation

Omni OCR Benchmark

Methodology

Evaluation Metrics

JSON accuracy

Text similarity

Running the benchmark

Supported models

Closed-source LLMs

Open-source LLMs

Cloud OCR Providers

Benchmark Dashboard

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages