A Supervised Framework for Document Processing at Scale with Large Language Models in Credit-Risk Research

Authors:
Cosmin Cojocaru¹ and Sorin Ionescu²
¹ National University of Science and Technology Politehnica Bucharest, Romania
² National University of Science and Technology Politehnica Bucharest, Romania

📘 Overview

This repository accompanies the paper:

Cojocaru, C., & Ionescu, S. (2025). A Supervised Framework for Document Processing at Scale with Large Language Models in Credit-Risk Research. Proceedings of ICMIE 2025.

It implements a supervised, schema-guided pipeline for document metadata extraction and evaluation using Large Language Models (LLMs).
Two notebooks are provided — one for the extraction framework (with Gradio UI) and one for evaluation and benchmarking.

📁 Repository Structure

notebooks/
├── 01_supervised_document_framework.ipynb → Gradio interface and extraction pipeline
└── 02_evaluation_of_jsons.ipynb → Evaluation and benchmarking logic

data/
├── Extracted JSON outputs per model
├── Centralized metadata CSV
└── evaluation/
    ├── Evaluation metrics (CSV)
    └── Radar visualization files (PNG)

🚀 Run on Google Colab

All dependencies are pre-installed in Colab.
Local execution is optional.

⚙️ Local Setup (Optional)

pip install -r requirements.txt

🧩 Data Note

Original PDFs are not redistributed due to copyright restrictions. All derived JSON and CSV files necessary for reproducibility are provided in data/ and data/evaluation/.

📊 Reproducibility

Baseline and LLM-generated metadata JSONs
Evaluation metrics (precision, recall, F1, Weighted Score)
Visualization data (radar plots and summaries)

📜 Citation

Cojocaru, C., & Ionescu, S. (2025). A Supervised Framework for Document Processing at Scale with Large Language Models in Credit-Risk Research. ICMIE 2025.

🪪 License

This project is released under the MIT License.

🧭 Code Availability

GitHub: https://github.com/cojocarucosmin/supervised-llm-document-framework (release v1.0)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
citation.cff		citation.cff
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Supervised Framework for Document Processing at Scale with Large Language Models in Credit-Risk Research

📘 Overview

📁 Repository Structure

🚀 Run on Google Colab

⚙️ Local Setup (Optional)

🧩 Data Note

📊 Reproducibility

📜 Citation

🪪 License

🧭 Code Availability

About

Uh oh!

Releases 1

Packages

Languages

License

cojocarucosmin/supervised-llm-document-framework

Folders and files

Latest commit

History

Repository files navigation

A Supervised Framework for Document Processing at Scale with Large Language Models in Credit-Risk Research

📘 Overview

📁 Repository Structure

🚀 Run on Google Colab

⚙️ Local Setup (Optional)

🧩 Data Note

📊 Reproducibility

📜 Citation

🪪 License

🧭 Code Availability

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages