-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Requested feature
Add support for extracting charts from PPTX files in Docling.
Currently, Docling ignores charts when parsing .pptx presentations, which means valuable data (e.g., bar/line/pie charts) is lost in downstream processing.
The feature could expose chart metadata and data series in a structured format (e.g., JSON) so that downstream tools can either visualize them or analyze the data programmatically.
Example JSON output for a simple bar chart:
This would make Docling much more useful for users working with business presentations where charts are as important as text.
Alternatives
Currently, the only workaround is to parse PPTX files manually using python-pptx, walk through all shapes, extract the chart data and build a custom JSON output.
While this works, it requires writing and maintaining additional code outside Docling, and prevents a unified API for accessing text + charts.
Additional context
I’d be happy to contribute an initial implementation using python-pptx, at least for the most common chart types (bar/column/line/pie).
Would you accept a PR adding this feature behind a flag or as part of the default PPTX parsing pipeline?