-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Description
The current design of docling-eval
assumes the workflow:
create-gt
: Create a Ground Truth dataset in HF parquet format.create-eval
: Create a prediction dataset in HF parquet format that contains the predictions and
the ground truth data from step 1.evaluate
: Run evaluations on the prediction dataset created in step 2.
In case the predictions already exist in lossless files like DocTag or DoclingDocument json formats, it is still possible to use the previous workflow via the FileProvider
. However this still imposes an unnecessary overhead because:
- It requires additional storage space to save the prediction parquet dataset.
- There is significant time spent in I/O to save the prediction dataset.
- A quick runtime benchmarking shows that 15% of the time is spent to convert DocTag files into
DoclingDocument objects and 85% to dump the shards of the created prediction dataset.
- A quick runtime benchmarking shows that 15% of the time is spent to convert DocTag files into
An improved design should allow the direct evaluation of DocTag/json files without the necessity to dump a prediction dataset on the disk.
One approach could be:
- The user places the
dt
orjson
files in a directory. - Each
dt
/json
file follows the naming convention:<document_id>.dt
,<document_id>.json
.document_id
must be the same with thedocument_id
column of the GT dataset.
- All evaluators must accept an optional parameter
external_predictions_path
. If present:- Each GT document is matched to a doctags/json file.
- The
doctags
file is loaded and converted on-the-fly to DoclingDocument object. Thejson
file is deserialized in DoclingDocument. - The evaluation proceeds between the GT-sourced doc and the prediction doc.
- The CLI for the
evaluate
command must accordingly be expanded to receive an optional parameter
--external-predictions-path
.
Notice: This design allows to parallelize the evaluations by comparing batches of GT/predicted documents concurrently.
Metadata
Metadata
Assignees
Labels
No labels