Skip to content

Conversation

@sidmohan0
Copy link
Contributor

Engine Selection and Structured Output for TextService

Changes

This PR implements the engine selection functionality for the TextService class as outlined in ticket #XX. It allows users to choose between different annotation engines while maintaining backward compatibility.

New Features

  • Engine Selection: Added an [engine] parameter to TextService that accepts:

    • "regex": Uses only RegexAnnotator (fastest, pattern-based)
    • "spacy": Uses only SpacyPIIAnnotator (more comprehensive)
    • "auto": Default mode that tries regex first and falls back to spaCy if no entities found
  • Structured Output: Added a structured parameter to annotation methods that returns a list of Span objects with:

    • label: Entity type (e.g., "EMAIL", "PERSON")
    • start: Character offset where entity begins
    • end: Character offset where entity ends
    • text: The actual text of the entity

Implementation Details

  • Modified all TextService methods to support both legacy dictionary output and new structured output
  • Ensured proper handling of text chunks with correct position adjustments
  • Added comprehensive test coverage for all new features
  • Updated documentation in code comments, README, and CHANGELOG

Testing

All tests pass, including the new integration tests specifically created for these features. The implementation maintains backward compatibility with existing code.

- Add engine parameter to TextService allowing 'regex', 'spacy', or 'auto' modes
- Implement auto-fallback mechanism that tries regex first, falls back to spaCy
- Add structured output option returning Span objects with position information
- Create comprehensive integration tests for the new features
- Update documentation in code comments, README, and CHANGELOG
@sidmohan0 sidmohan0 self-assigned this May 2, 2025
@sidmohan0 sidmohan0 added this to the 4.1.0 milestone May 2, 2025
@sidmohan0 sidmohan0 merged commit 8dd0053 into feat/regex-fallback May 2, 2025
@sidmohan0 sidmohan0 deleted the feat/text-service-engine-selection branch May 2, 2025 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants