Skip to content

Official repository for the paper “A Supervised Framework for Document Processing at Scale with Large Language Models in Credit-Risk Research” (ICMIE 2025). Includes Colab notebooks, JSON evaluation data, and reproducibility materials.

License

Notifications You must be signed in to change notification settings

cojocarucosmin/supervised-llm-document-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A Supervised Framework for Document Processing at Scale with Large Language Models in Credit-Risk Research

Authors:
Cosmin Cojocaru¹ and Sorin Ionescu²
¹ National University of Science and Technology Politehnica Bucharest, Romania
² National University of Science and Technology Politehnica Bucharest, Romania


📘 Overview

This repository accompanies the paper:

Cojocaru, C., & Ionescu, S. (2025). A Supervised Framework for Document Processing at Scale with Large Language Models in Credit-Risk Research. Proceedings of ICMIE 2025.

It implements a supervised, schema-guided pipeline for document metadata extraction and evaluation using Large Language Models (LLMs).
Two notebooks are provided — one for the extraction framework (with Gradio UI) and one for evaluation and benchmarking.


📁 Repository Structure

notebooks/
├── 01_supervised_document_framework.ipynb → Gradio interface and extraction pipeline
└── 02_evaluation_of_jsons.ipynb → Evaluation and benchmarking logic

data/
├── Extracted JSON outputs per model
├── Centralized metadata CSV
└── evaluation/
    ├── Evaluation metrics (CSV)
    └── Radar visualization files (PNG)

🚀 Run on Google Colab

All dependencies are pre-installed in Colab.
Local execution is optional.


⚙️ Local Setup (Optional)

pip install -r requirements.txt

🧩 Data Note

Original PDFs are not redistributed due to copyright restrictions. All derived JSON and CSV files necessary for reproducibility are provided in data/ and data/evaluation/.

📊 Reproducibility

  • Baseline and LLM-generated metadata JSONs
  • Evaluation metrics (precision, recall, F1, Weighted Score)
  • Visualization data (radar plots and summaries)

📜 Citation

Cojocaru, C., & Ionescu, S. (2025). A Supervised Framework for Document Processing at Scale with Large Language Models in Credit-Risk Research. ICMIE 2025.

🪪 License

This project is released under the MIT License.

🧭 Code Availability

GitHub: https://github.com/cojocarucosmin/supervised-llm-document-framework (release v1.0)

About

Official repository for the paper “A Supervised Framework for Document Processing at Scale with Large Language Models in Credit-Risk Research” (ICMIE 2025). Includes Colab notebooks, JSON evaluation data, and reproducibility materials.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published