support olmocrbench #1269

Liuziyu77 · 2025-10-16T05:34:14Z

olmOCRBench is a document understanding and OCR-oriented benchmark designed for evaluating large vision-language models (LVLMs) on real-world document parsing and text recognition tasks.
This PR integrates the olmOCRBench benchmark into VLMEvalKit, providing a complete evaluation pipeline for the validation split.

Evaluated results (InternVL3.5-8B):

Model	ArXiv	Base	Hdr/Ftr	TinyTxt	MultCol	OldScan	OldMath	Tables	Overall
InternVL3.5-8B	55.8	98.6	82.2	68.8	67.2	33.5	62.2	74.1	67.8

Liuziyu77 · 2025-10-20T13:10:32Z

Support Ocean-OCR Bench and MATBench.
Ocean-OCR Bench is a benchmark from Ocean-OCR, it measures edit distance, f1 score, recall and so on.

Model	Edit Distance↓ (en)	Edit Distance↓ (zh)	F1-Score↑ (en)	F1-Score↑ (zh)	Precision↑ (en)	Precision↑ (zh)	Recall↑ (en)	Recall↑ (zh)	BLEU↑ (en)	BLEU↑ (zh)	METEOR↑ (en)	METEOR↑ (zh)
InterVL3.5 (Github code)	0.1071	0.2971	0.7988	0.8078	0.8218	0.8506	0.7799	0.7823	0.6317	0.5198	0.7830	0.6971
InterVL3.5 (vlmevalkti)	0.1387	0.3331	0.7640	0.7888	0.7889	0.8512	0.7462	0.7476	0.6141	0.4840	0.7497	0.6656

MAT-Bench is a Multi-modal Agentic Tool Bench (MAT) with two settings (MAT-Search and MAT-Coding) designed to evaluate LVLMs’ agentic search and coding abilities.

Metric	Qwen2.5-VL-7B（VLMEvalKit）	Qwen2.5-VL-7B（Github repo）
code_f1_avg	34.41	32.12
code_em_avg	22.00	21.50
search_f1_avg	50.16	53.49
search_em_avg	43.62	46.67

root and others added 3 commits October 16, 2025 05:24

support olmocrbench

4466bf1

fix tsv file path

a2c971a

support Ocean-OCR and MAT-Bench

d4f912f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support olmocrbench #1269

support olmocrbench #1269

Liuziyu77 commented Oct 16, 2025

Uh oh!

Liuziyu77 commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

support olmocrbench #1269

Are you sure you want to change the base?

support olmocrbench #1269

Conversation

Liuziyu77 commented Oct 16, 2025

Uh oh!

Liuziyu77 commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant