Skip to content

Conversation

devinbost
Copy link

PR title

feat(plugins): introduce plugin hook framework and lifecycle events

Description

Adds an extensible plugin hook framework to Docling's conversion pipeline, allowing users to register custom logic at well-defined lifecycle events without modifying core code. Default behavior is unchanged when no plugins are configured. Branch reference: feat/plugin-hooks.

Motivation

  • Enable safe, first-class extensibility for pre/post-processing, metadata enrichment, redaction, normalization, analytics, and integrations.
  • Provide a stable surface to experiment with custom transformations while keeping core logic maintainable.

What’s changed

  • Introduced a centralized plugin manager and lifecycle hook dispatching.
  • Added lifecycle events (examples): before_convert, after_convert, before_parse_page, after_parse_page, before_export, after_export.
  • Integrated hook dispatch into the pipeline with negligible overhead when no plugins are registered.
  • Added configuration to register plugins via Python API and optional CLI flags.
  • Guardrails: deterministic execution order, fail-fast/error bubbling by default, optional continue-on-error mode.

API/CLI

  • Python API: optional plugin registration during converter setup.
  • CLI: optional flags to load plugins and pass configuration (off by default).
  • Backward compatible: no changes required for existing users.

Documentation

  • Added a Plugins/Extensions guide with authoring, registration, configuration, and testing steps.
  • Updated README to reference the plugin system and examples.

Tests

  • Unit tests for dispatch ordering, error bubbling, and no-op overhead when disabled.
  • Integration tests covering end-to-end runs with sample plugins and optional CLI loading.
  • Coverage includes concurrency-sensitive paths and common failure scenarios.

Performance and security notes

  • Minimal overhead when no plugins are present (no-op dispatch).
  • Plugins run in-process with host privileges; avoid loading untrusted plugins.

Breaking changes

  • None. Existing behavior is unchanged unless plugins are explicitly configured.

Checklist

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Copy link
Contributor

github-actions bot commented Sep 3, 2025

DCO Check Failed

Hi @devinbost, your pull request has failed the Developer Certificate of Origin (DCO) check.

This repository supports remediation commits, so you can fix this without rewriting history — but you must follow the required message format.


🛠 Quick Fix: Add a remediation commit

Run this command:

git commit --allow-empty -s -m "DCO Remediation Commit for Devin Bost <devin.bost@gmail.com>

I, Devin Bost <devin.bost@gmail.com>, hereby add my Signed-off-by to this commit: 86bc70e1c926a11b2f6c9ffd73384e3e0eac7ba4
I, Devin Bost <devin.bost@gmail.com>, hereby add my Signed-off-by to this commit: 054ffa6066f4d2a3fd96e111d34bcb5d13cf3741
I, Devin Bost <devinbost@users.noreply.github.com>, hereby add my Signed-off-by to this commit: bf4c112f177e20bd2f25c3f83563cb48dd9ade32"
git push

🔧 Advanced: Sign off each commit directly

For the latest commit:

git commit --amend --signoff
git push --force-with-lease

For multiple commits:

git rebase --signoff origin/main
git push --force-with-lease

More info: DCO check report

Copy link

dosubot bot commented Sep 3, 2025

Related Documentation

Checked 2 published document(s). No updates required.

How did I do? Any feedback?  Join Discord

Copy link

mergify bot commented Sep 3, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 Enforce conventional commit

This rule is failing.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@dolfim-ibm
Copy link
Contributor

We are planning to expand (a lot) the type of plugins which can be registered in Docling

For example:

  • Custom document formats
  • OCR engines (already available)
  • Custom enrichment steps
  • Custom VLM steps
  • Post-processing (e.g. layout-specific processing)
  • Happy to work together on being this level of extensibility

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants