Skip to content

Commit bd2ecb5

Browse files
committed
Merge branch 'feature/permission-boundaries' into 'develop'
Feature/permission boundaries See merge request genaiic-reusable-assets/engagement-artifacts/genaiic-idp-accelerator!235
2 parents c43dc97 + e0b4561 commit bd2ecb5

File tree

12 files changed

+265
-96
lines changed

12 files changed

+265
-96
lines changed

CHANGELOG.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,12 @@ SPDX-License-Identifier: MIT-0
55

66
## [Unreleased]
77

8+
### Added
9+
- **Optional Permissions Boundary Support for Enterprise Deployments**
10+
- Added `PermissionsBoundaryArn` parameter to all CloudFormation templates for organizations with Service Control Policies (SCPs) requiring permissions boundaries
11+
- Comprehensive support for both explicit IAM roles and implicit roles created by AWS SAM functions and statemachines`
12+
- Conditional implementation ensures backward compatibility - when no permissions boundary is provided, roles deploy normally
13+
814
## [0.3.8]
915

1016
### Added

docs/aws-services-and-roles.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,20 @@ This document outlines the AWS services used by the GenAI Intelligent Document P
5151

5252
## IAM Role Requirements
5353

54+
### Enterprise Deployment Considerations
55+
56+
For organizations with Service Control Policies (SCPs) that mandate permissions boundaries on all IAM roles, the solution provides comprehensive support through the `PermissionsBoundaryArn` parameter. This optional parameter can be specified during deployment to attach a permissions boundary to all IAM roles (both explicit roles and implicit roles created by AWS SAM functions).
57+
58+
**Usage:**
59+
```bash
60+
aws cloudformation deploy \
61+
--template-file template.yaml \
62+
--parameter-overrides PermissionsBoundaryArn=arn:aws:iam::123456789012:policy/MyPermissionsBoundary \
63+
--capabilities CAPABILITY_IAM
64+
```
65+
66+
When no permissions boundary is specified, roles deploy normally, ensuring backward compatibility.
67+
5468
### Deployment Roles
5569

5670
Deploying this solution requires an IAM role/user with the following permissions:

docs/well-architected.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ The GenAI Intelligent Document Processing (GenAIIDP) Accelerator demonstrates st
3030
### Strengths
3131

3232
- **Defense in Depth**: Multiple security layers including IAM roles with least privilege, encryption at rest, and secure API access.
33+
- **Enterprise IAM Governance**: Comprehensive support for IAM permissions boundaries to comply with organizational Service Control Policies (SCPs) that mandate permissions boundaries on all IAM roles.
3334
- **Content Safety**: Integration with Amazon Bedrock Guardrails to enforce content policies, block sensitive information, and prevent model misuse.
3435
- **Authentication**: Cognito user pools with configurable password policies and MFA support.
3536
- **Authorization**: Fine-grained access controls for different components and resources.
@@ -146,4 +147,4 @@ The GenAI Intelligent Document Processing Accelerator demonstrates strong alignm
146147

147148
Key strengths include the serverless architecture, which provides automatic scaling and resilience, and the comprehensive monitoring capabilities that enable operational visibility. The solution's modular design allows for customization and extension to meet specific business requirements.
148149

149-
Areas for potential enhancement include more granular cost controls, multi-region resilience strategies, and sustainability optimizations. By addressing these recommendations, the solution can further improve its alignment with Well-Architected best practices.
150+
Areas for potential enhancement include more granular cost controls, multi-region resilience strategies, and sustainability optimizations. By addressing these recommendations, the solution can further improve its alignment with Well-Architected best practices.

memory-bank/activeContext.md

Lines changed: 65 additions & 91 deletions
Original file line numberDiff line numberDiff line change
@@ -2,111 +2,85 @@
22

33
## Current Task Focus
44

5-
**User Question**: Understanding OCR processing architecture for large PDFs (500+ pages) in the IDP accelerator, specifically:
6-
1. Is OCR processing sequential or distributed by page?
7-
2. How does Bedrock-only OCR deployment differ?
8-
3. What parts of the system run sequentially vs distributed?
9-
4. Handling massive PDFs with hundreds of forms without clear page boundaries
5+
**Customer Question**: "We are encountering difficulties deploying your IDP stack outside of a sandbox environment due to an organization-wide Service Control Policy (SCP). This policy mandates the attachment of a Permissions Boundary to any new role. Could you please inform us if it is possible to update the CloudFormation template to include a parameterized Permissions Boundary? Without this update, our ability to transition the code to production will be significantly impeded."
106

11-
## Key Findings
7+
**Task Status**: Implementation phase - Need to add Permissions Boundary parameter support to CloudFormation templates
128

13-
### OCR Processing Models
9+
## Problem Analysis
1410

15-
The IDP accelerator uses **different processing models depending on the pattern**:
11+
### Current Situation
12+
- IDP stack creates numerous IAM roles across main template and pattern templates
13+
- Organization has SCP requiring Permissions Boundary on all new IAM roles
14+
- Current templates don't support Permissions Boundary configuration
15+
- Blocking production deployment
1616

17-
#### Pattern 1 (BDA): Sequential Internal Processing
18-
- **OCR Approach**: Bedrock Data Automation handles everything internally
19-
- **Processing**: Entire document processed as single unit by BDA service
20-
- **Concurrency**: Not user-controllable, managed by BDA
21-
- **Large Documents**: Subject to BDA service limits and timeouts
17+
### Affected Templates
18+
- **Main Template**: `template.yaml` - ~15 IAM roles
19+
- **Pattern 1**: `patterns/pattern-1/template.yaml` - ~8 IAM roles
20+
- **Pattern 2**: `patterns/pattern-2/template.yaml` - ~6 roles
21+
- **Pattern 3**: `patterns/pattern-3/template.yaml` - ~5 roles
22+
- **Options**: `options/bda-lending-project/template.yaml`, `options/bedrockkb/template.yaml`
2223

23-
#### Pattern 2/3 (Textract + Bedrock): Distributed Page Processing
24-
- **OCR Approach**: AWS Textract with concurrent page processing
25-
- **Processing**: **Pages processed in parallel** using ThreadPoolExecutor
26-
- **Concurrency**: Configurable (default: 20 concurrent workers)
27-
- **Large Documents**: Optimal for 500+ page documents
24+
## Solution Design
2825

29-
### Sequential vs Distributed Components
26+
### Approach: Parameterized Permissions Boundary
27+
1. **Add optional parameter** to main template for Permissions Boundary ARN
28+
2. **Conditionally apply boundary** to all IAM roles when provided
29+
3. **Maintain backward compatibility** for deployments without boundaries
30+
4. **Cascade parameter** to all nested pattern stacks
3031

31-
#### Sequential Processing:
32-
1. **Step Functions Workflow**: OCR → Classification → Extraction → Assessment → Summarization
33-
2. **Classification**: Analyzes all pages to create document boundaries
34-
3. **BDA Internal Processing**: Everything handled as single unit
32+
### Implementation Plan
3533

36-
#### Distributed Processing:
37-
1. **OCR Pages (Pattern 2/3)**: Up to 20 pages processed simultaneously
38-
2. **Extraction Sections**: Up to 10 document sections processed in parallel
39-
3. **Independent API Calls**: Each page makes separate Textract calls
34+
#### Step 1: Main Template Updates (`template.yaml`)
35+
- Add `PermissionsBoundaryArn` parameter
36+
- Add `HasPermissionsBoundary` condition
37+
- Update all IAM role resources with conditional boundary
38+
- Pass parameter to nested stacks
39+
- Update CloudFormation interface metadata
4040

41-
## Customer Scenario Analysis
41+
#### Step 2: Pattern Template Updates
42+
- Add parameter to each pattern template
43+
- Update all IAM roles in patterns
44+
- Maintain consistency across all patterns
4245

43-
### 500+ Page PDF with Multiple Forms
46+
#### Step 3: Options Template Updates
47+
- Update BDA lending project template
48+
- Update Bedrock KB template
4449

45-
**Challenge**: Single PDF containing hundreds of forms without clear page boundaries
50+
### Key Implementation Details
4651

47-
**Recommended Approach**: Pattern 2 or 3 for optimal performance
48-
49-
**Why Pattern 2/3 is Better**:
50-
- **Page-Level Parallelism**: 500 pages processed 20 at a time
51-
- **Memory Efficiency**: Individual pages loaded, not entire document
52-
- **Fault Tolerance**: Page failures don't stop entire processing
53-
- **Granular Control**: Can optimize per-page processing
54-
55-
**Classification Strategy**:
56-
- Use "holistic" classification method to analyze entire document
57-
- Creates logical sections grouping related pages
58-
- Handles form boundaries that don't align with page boundaries
59-
60-
## Technical Implementation Details
52+
**Parameter Definition:**
53+
```yaml
54+
PermissionsBoundaryArn:
55+
Type: String
56+
Default: ""
57+
Description: (Optional) ARN of IAM Permissions Boundary policy
58+
AllowedPattern: "^(|arn:aws:iam::[0-9]{12}:policy/.+)$"
59+
```
6160
62-
### OCR Service Configuration for Large Documents
61+
**Condition:**
62+
```yaml
63+
HasPermissionsBoundary: !Not [!Equals [!Ref PermissionsBoundaryArn, ""]]
64+
```
6365
66+
**Role Update Pattern:**
6467
```yaml
65-
ocr:
66-
backend: "textract"
67-
max_workers: 20 # Increase for more parallelism
68-
image:
69-
dpi: 150 # Balance quality vs processing time
70-
target_width: 1024
71-
target_height: 1024
72-
features:
73-
- name: "LAYOUT"
74-
- name: "TABLES"
75-
- name: "FORMS"
68+
SomeRole:
69+
Type: AWS::IAM::Role
70+
Properties:
71+
# existing properties...
72+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
7673
```
7774
78-
### Processing Flow for Large PDFs
79-
80-
1. **Document Load**: PyMuPDF loads PDF structure
81-
2. **Page Distribution**: ThreadPoolExecutor creates 20 concurrent workers
82-
3. **Parallel OCR**: Each page processed independently via Textract
83-
4. **Result Assembly**: Pages sorted and combined into document structure
84-
5. **Classification**: Holistic analysis creates logical document sections
85-
6. **Parallel Extraction**: Sections processed concurrently (MaxConcurrency: 10)
86-
87-
## Performance Implications
88-
89-
### For 500-Page Document:
90-
- **Pattern 1 (BDA)**: Single job, BDA-managed processing
91-
- **Pattern 2/3**: ~25 batches of 20 pages each, highly parallelized
92-
93-
### Bottlenecks to Consider:
94-
1. **Textract Rate Limits**: May need to adjust max_workers
95-
2. **Memory Usage**: 20 concurrent pages require significant memory
96-
3. **S3 Operations**: Parallel uploads/downloads for page results
97-
4. **Lambda Timeouts**: Ensure sufficient timeout for large documents
98-
99-
## Next Steps and Considerations
100-
101-
### For Customer Implementation:
102-
1. **Choose Pattern 2 or 3** for large document processing
103-
2. **Configure max_workers** based on Textract limits and memory
104-
3. **Use holistic classification** to handle form boundaries
105-
4. **Monitor memory usage** during processing
106-
5. **Consider document splitting** if single PDF approach is problematic
107-
108-
### Optimization Opportunities:
109-
- **Adaptive Concurrency**: Adjust workers based on document size
110-
- **Progressive Processing**: Start classification while OCR continues
111-
- **Caching Strategy**: Cache page images for reprocessing
112-
- **Error Recovery**: Implement page-level retry with exponential backoff
75+
## Benefits
76+
- **SCP Compliance**: Satisfies organizational requirements
77+
- **Backward Compatible**: Existing deployments unaffected
78+
- **Flexible**: Works with any Permissions Boundary policy
79+
- **Comprehensive**: Covers all IAM roles across all components
80+
81+
## Next Steps
82+
1. Implement main template changes
83+
2. Update all pattern templates
84+
3. Update options templates
85+
4. Test deployment scenarios
86+
5. Document usage examples

options/bda-lending-project/template.yaml

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,20 @@ Parameters:
2727
- CRITICAL
2828
Description: Default logging level for Lambda functions
2929

30+
PermissionsBoundaryArn:
31+
Type: String
32+
Default: ""
33+
Description: >-
34+
(Optional) ARN of an existing IAM Permissions Boundary policy to attach to all IAM roles.
35+
Required by some organizations with Service Control Policies (SCPs).
36+
Format: arn:aws:iam::account-id:policy/policy-name
37+
Leave blank if no Permissions Boundary is required.
38+
AllowedPattern: "^(|arn:aws:iam::[0-9]{12}:policy/.+)$"
39+
ConstraintDescription: Must be empty or a valid IAM policy ARN
40+
41+
Conditions:
42+
HasPermissionsBoundary: !Not [!Equals [!Ref PermissionsBoundaryArn, ""]]
43+
3044
Resources:
3145

3246
# IAM role for Lambda function
@@ -47,6 +61,7 @@ Resources:
4761
Action: sts:AssumeRole
4862
ManagedPolicyArns:
4963
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
64+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
5065
Policies:
5166
- PolicyName: BedrockDataAutomationAccess
5267
PolicyDocument:
@@ -143,4 +158,4 @@ Outputs:
143158

144159
BlueprintArns:
145160
Description: ARNs of the blueprints added to the project
146-
Value: !Join [", ", !GetAtt BDAProject.blueprintArns]
161+
Value: !Join [", ", !GetAtt BDAProject.blueprintArns]

options/bedrockkb/template.yaml

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,17 @@ Parameters:
123123
Type: String
124124
Default: AMAZON_BEDROCK_TEXT_CHUNK
125125

126+
PermissionsBoundaryArn:
127+
Type: String
128+
Default: ""
129+
Description: >-
130+
(Optional) ARN of an existing IAM Permissions Boundary policy to attach to all IAM roles.
131+
Required by some organizations with Service Control Policies (SCPs).
132+
Format: arn:aws:iam::account-id:policy/policy-name
133+
Leave blank if no Permissions Boundary is required.
134+
AllowedPattern: "^(|arn:aws:iam::[0-9]{12}:policy/.+)$"
135+
ConstraintDescription: Must be empty or a valid IAM policy ARN
136+
126137
Metadata:
127138
AWS::CloudFormation::Interface:
128139
ParameterGroups:
@@ -228,6 +239,7 @@ Conditions:
228239
Fn::Or:
229240
- Condition: IsChunkingStrategyFixed
230241
- Condition: IsChunkingStrategyDefault
242+
HasPermissionsBoundary: !Not [!Equals [!Ref PermissionsBoundaryArn, ""]]
231243

232244
Resources:
233245
# Custom resource to transform input to lowercase.
@@ -245,6 +257,7 @@ Resources:
245257
# checkov:skip=CKV_AWS_115: "Function does not require reserved concurrency as it scales based on demand"
246258
# checkov:skip=CKV_AWS_173: "Environment variables do not contain sensitive data - only configuration values like feature flags and non-sensitive settings"
247259
Properties:
260+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
248261
Handler: index.handler
249262
Runtime: python3.12
250263
Timeout: 30
@@ -290,6 +303,7 @@ Resources:
290303
# checkov:skip=CKV_AWS_115: "Function does not require reserved concurrency as it scales based on demand"
291304
# checkov:skip=CKV_AWS_173: "Environment variables do not contain sensitive data - only configuration values like feature flags and non-sensitive settings"
292305
Properties:
306+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
293307
Handler: index.handler
294308
Runtime: python3.12
295309
Timeout: 30
@@ -388,6 +402,7 @@ Resources:
388402
- lambda.amazonaws.com
389403
Action:
390404
- sts:AssumeRole
405+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
391406
Policies:
392407
- PolicyName: OSSLambdaRoleDefaultPolicy # Reference: https://docs.aws.amazon.com/service-authorization/latest/reference/list_awsx-ray.html
393408
PolicyDocument:
@@ -465,6 +480,7 @@ Resources:
465480
# checkov:skip=CKV_AWS_115: "Function does not require reserved concurrency as it scales based on demand"
466481
# checkov:skip=CKV_AWS_173: "Environment variables do not contain sensitive data - only configuration values like feature flags and non-sensitive settings"
467482
Properties:
483+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
468484
Handler: oss_handler.lambda_handler
469485
MemorySize: 1024
470486
Role: !GetAtt OpenSearchLambdaExecutionRole.Arn
@@ -515,6 +531,7 @@ Resources:
515531
aws:SourceAccount: !Sub ${AWS::AccountId}
516532
ArnLike:
517533
aws:SourceArn: !Sub arn:aws:bedrock:${AWS::Region}:${AWS::AccountId}:knowledge-base/*
534+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
518535
Policies:
519536
- PolicyName: bedrock-invoke-model
520537
PolicyDocument:
@@ -711,6 +728,7 @@ Resources:
711728
Action: sts:AssumeRole
712729
ManagedPolicyArns:
713730
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
731+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
714732
Policies:
715733
- PolicyDocument:
716734
Version: 2012-10-17
@@ -812,6 +830,7 @@ Resources:
812830
Principal:
813831
Service: scheduler.amazonaws.com
814832
Action: sts:AssumeRole
833+
PermissionsBoundary: !If [HasPermissionsBoundary, !Ref PermissionsBoundaryArn, !Ref AWS::NoValue]
815834
Policies:
816835
- PolicyName: BedrockAgentStartIngestionPolicy
817836
PolicyDocument:

0 commit comments

Comments
 (0)