v0.3.15
[0.3.15]
Added
-
Intelligent Document Discovery Module for Automated Configuration Generation
- Added Discovery module that automatically analyzes document samples to identify structure, field types, and organizational patterns
- Pattern-Neutral Design: Works across all processing patterns (1, 2, 3) with unified discovery process and pattern-specific implementations
- Dual Discovery Methods: Discovery without ground truth (exploratory analysis) and with ground truth (optimization using labeled data)
- Automated Blueprint Creation: Pattern 1 includes zero-touch BDA blueprint generation with intelligent change detection and version management
- Web UI Integration: Real-time discovery job monitoring, interactive results review, and seamless configuration integration
- Advanced Features: Multi-model support (Nova, Claude), customizable prompts, configurable parameters, ground truth processing, schema conversion, and lifecycle management
- Key Benefits: Rapid new document type onboarding, reduced time-to-production, configuration optimization, and automated workflow bootstrapping
- Use Cases: New document exploration, configuration improvement, rapid prototyping, and document understanding
- Documentation: Guide in
docs/discovery.mdwith architecture details, best practices, and troubleshooting
-
Optional Pattern-2 Regex-Based Classification for Enhanced Performance
- Added support for optional regex patterns in document class definitions for performance optimization
- Document Name Regex: Match against document ID/name to classify all pages without LLM processing when all pages should be the same class
- Document Page Content Regex: Match against page text content during multi-modal page-level classification for fast page classification
- Key Benefits: Significant performance improvements and cost savings by bypassing LLM calls for pattern-matched documents, deterministic classification results for known document patterns, seamless fallback to existing LLM classification when regex patterns don't match
- Configuration: Optional
document_name_regexanddocument_page_content_regexfields in class definitions with automatic regex compilation and validation - Logging: Comprehensive info-level logging when regex patterns match for observability and debugging
- CloudFormation Integration: Updated Pattern-2 schema to support regex configuration through the Web UI
- Demonstration: New
step2_classification_with_regex.ipynbnotebook showcasing regex configuration and performance comparisons - Documentation: Enhanced classification module README and main documentation with regex usage examples and best practices
-
Windows WSL Development Environment Setup Guide
- Added WSL-based development environment setup guide for Windows developers in
docs/setup-development-env-WSL.md - Key Features: Automated setup script (
wsl_setup.sh) for quick installation of Git, Python, Node.js, AWS CLI, and SAM CLI - Integrated Workflow: Development setup combining Windows tools (VS Code, browsers) with native Linux environment
- Target Use Cases: Windows developers needing Linux compatibility without Docker Desktop or VM overhead
- Added WSL-based development environment setup guide for Windows developers in
Fixed
-
Throttling Error Detection and Retry Logic for Assessment Functions - GitHub Issue #45
- Assessment Function: Enhanced throttling detection to check for throttling errors returned in
document.errorsfield in addition to thrown exceptions, raisingThrottlingExceptionto trigger Step Functions retry when throttling is detected - Granular Assessment Task Caching: Fixed caching logic to properly cache successful assessment tasks when there are ANY failed tasks (both exception-based and result-based failures), enabling efficient retry optimization by only reprocessing failed tasks while preserving successful results
- Impact: Improved resilience for throttling scenarios, reduced redundant processing during retries, and better Step Functions retry behavior
- Assessment Function: Enhanced throttling detection to check for throttling errors returned in
-
Security Vulnerability Mitigation - Package Updates
-
GovCloud Compatibility - Hardcoded Service Domain References
- Fixed hardcoded
amazonaws.comreferences in CloudFormation templates that prevented GovCloud deployment - Updated all service principals and endpoints to use dynamic
${AWS::URLSuffix}expressions for automatic region-based resolution - Templates Updated:
template.yaml(main template),patterns/pattern-3/sagemaker_classifier_endpoint.yaml - Services Fixed: EventBridge, Cognito, SageMaker, ECR, CloudFront, CodeBuild, AppSync, Lambda, DynamoDB, CloudWatch Logs, Glue
- Resolves GitHub Issue #50 - templates now deploy correctly in both standard AWS and GovCloud regions
- Fixed hardcoded
-
Bug Fixes and Code Improvements
- Fixed HITL processing errors in both Pattern-1 (DynamoDB validation with empty strings) and Pattern-2 (string indices error in A2I output processing)
- Fixed Step Function UI issues including auto-refresh button auto-disable and fetch failures for failed executions with datetime serialization errors
- Cleaned up unused Step Function subscription infrastructure and removed duplicate code in Pattern-2 HITL function
- Expanded UI Visual Editor bounding box size with padding for better visibility and user interaction
- Fixed bug in list of models supporting cache points - previously claude 4 sonnet and opus had been excluded.
- Validations added at the assessment step for checking valid json response. The validation fails after extraction/assessment is complete if json parsing issues are encountered.