-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Feature Request / Improvement
Description:
Problem:
BigQueryMetastoreCatalog only supports Application Default Credentials with no mechanism for service account impersonation. This prevents identity separation between cluster operations and data access.
Current Behavior:
Dataproc cluster (cluster-sa)
↓
BigQuery Metastore operations: Always uses cluster-sa
No way to configure impersonation. This forces cluster service accounts to have both infrastructure and data permissions, preventing multi-tenancy and proper audit trails.
Example Failure:
Runtime: Spark on Dataproc as dataproc-sa@project.iam
Desired: Access tables as data-sa@project.iam
Result: All BigQuery Metastore calls use dataproc-sa
Cannot separate operational permissions from data access
Impact:
Without impersonation support, organizations cannot implement least-privilege security or run multi-tenant workloads on shared clusters, which are standard requirements for production deployments.
AWS Comparison:
Iceberg already supports this for AWS via AssumeRoleAwsClientFactory. This creates consistent identity for both Glue Metastore access and S3 data access.
Proposed Solution:
Add pluggable factory pattern (like AwsClientFactory) for BigQuery client creation with impersonation support using Google's ImpersonatedCredentials API.
References:
Query engine
None
Willingness to contribute
- I can contribute this improvement/feature independently
- I would be willing to contribute this improvement/feature with guidance from the Iceberg community
- I cannot contribute this improvement/feature at this time