From 9dd7b4472019cf366349fcc69051f318dbb51108 Mon Sep 17 00:00:00 2001 From: 6801318d8d <144167388+6801318d8d@users.noreply.github.com> Date: Sun, 6 Jul 2025 18:34:42 +0200 Subject: [PATCH] Fix link to Docling --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c140392..7c471f3 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ # spaCy Layout: Process PDFs, Word documents and more with spaCy -This plugin integrates with [Docling](https://ds4sd.github.io/docling/) to bring structured processing of **PDFs**, **Word documents** and other input formats to your [spaCy](https://spacy.io) pipeline. It outputs clean, **structured data** in a text-based format and creates spaCy's familiar [`Doc`](https://spacy.io/api/doc) objects that let you access labelled text spans like sections or headings, and tables with their data converted to a `pandas.DataFrame`. +This plugin integrates with [Docling](https://github.com/docling-project/docling) to bring structured processing of **PDFs**, **Word documents** and other input formats to your [spaCy](https://spacy.io) pipeline. It outputs clean, **structured data** in a text-based format and creates spaCy's familiar [`Doc`](https://spacy.io/api/doc) objects that let you access labelled text spans like sections or headings, and tables with their data converted to a `pandas.DataFrame`. This workflow makes it easy to apply powerful **NLP techniques** to your documents, including linguistic analysis, named entity recognition, text classification and more. It's also great for implementing **chunking for RAG** pipelines.