-
Notifications
You must be signed in to change notification settings - Fork 3k
feat(docx): Process drawingml objects in docx #2453
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ DCO Check Passed Thanks @rateixei, all your commits are properly signed off. 🎉 |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
🟢 Require two reviewer for test updatesWonderful, this rule succeeded.When test data is updated, we require two reviewers
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
…ail.com> I, Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>, hereby add my Signed-off-by to this commit: 9518fff Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- IMHO, LibreOffice should be an optional dependency in Docling. However, if a user does not have LibreOffice the backend test fails.
We could first check if LibreOffice is installed and compare the output accordingly
FAILED tests/test_backend_msword.py::test_e2e_docx_conversions - AssertionError: export to markdown failed on tests/data/docx/drawingml.docx
- Just FYI for later: instead of an environment variable we could use backend options to set the path to LibreOffice, since it is more transparent and easier to document. On the PR #2011 that I am preparing, I introduce backend options
- Could we extend this feature to the xlsx and pptx backends?
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Michele Dolfi <dol@zurich.ibm.com>
Signed-off-by: Rafael Teixeira de Lima <Rafael.td.lima@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🏆
This PR implements a way to import Docx DrawingML objects as PNG images into DoclingDocument objects. This includes diagrams, hand-drawn shapes, and Word/Excel charts.
This is performed with the following steps:
DOCLING_LIBREOFFICE_CMD
). If not available is available, a warning is displayed.An example docx file containing diagrams, figures and charts is attached, along with the DoclingDocument export.
Checklist:
drawingml_example.docx
drawingml_example.json