-
Notifications
You must be signed in to change notification settings - Fork 721
Description
Is your idea related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
My issue is I get a lot of GetWorkgroup throttling in my account, leading to eventual failures in some of my services. This happens because lambdas using wrangler are invoked in high-frequency, and invoke GetWorkgroup as part of the flow for any and all Athena queries.
In aws-sdk-pandas's Athena code, _get_workgroup_config is regularly invoked before _start_query_execution, and the resulting _WorkGroupConfig is (exclusively as far as I can see) used to override a few client_athena.start_query_execution arguments, namely the output location (S3 bucket), the encryption and the KMS. However, this is redundant - if we omit the explicit setting of these arguments, the workgroup settings in AWS will be used (according to https://docs.aws.amazon.com/athena/latest/APIReference/API_StartQueryExecution.html#API_StartQueryExecution_RequestSyntax and the ResultConfiguration documentation). In fact, currently this code does nothing - it calls GetWorkgroup in order to 'override' the (server) workgroup definitions with (previusly obtained) workgroup definitions.
Describe the solution you'd like
A clear and concise description of what you are expecting.
I suggest one of 2 options:
- If this function is indeed redundant (possibly a leftover from a time where AWS Athena workgroup did not include some of these definitions), it can be removed entirely
- If I missed the use case of this function, I would like an option to configure wrangler to not use it. I think in most cases these definitions should be set in the server, and not in the client, so unless you have a special case where you want to override the server set up, it should be avoided. If you do want to override it, you probably don't want to do it implicitly with whatever is defined on the workgroup, so in this scenario it needs to be explicitly set
P.S. Please do not attach files as it's considered a security risk. Add code snippets directly in the message body as much as possible.
Relevant code:
The use of the workgroup config:
def _start_query_execution(...
...
# s3_output
args["ResultConfiguration"] = {
"OutputLocation": _get_s3_output(s3_output=s3_output, wg_config=wg_config, boto3_session=boto3_session)
}
# encryption
if wg_config.enforced is True:
if wg_config.encryption is not None:
args["ResultConfiguration"]["EncryptionConfiguration"] = {"EncryptionOption": wg_config.encryption}
if wg_config.kms_key is not None:
args["ResultConfiguration"]["EncryptionConfiguration"]["KmsKey"] = wg_config.kms_key
The get workgroup code:
def _get_workgroup_config(session: boto3.Session | None = None, workgroup: str = "primary") -> _WorkGroupConfig:
enforced: bool
wg_s3_output: str | None
wg_encryption: str | None
wg_kms_key: str | None
enforced, wg_s3_output, wg_encryption, wg_kms_key = False, None, None, None
if workgroup is not None:
res = get_work_group(workgroup=workgroup, boto3_session=session)
enforced = res["WorkGroup"]["Configuration"]["EnforceWorkGroupConfiguration"]
config: dict[str, Any] = res["WorkGroup"]["Configuration"].get("ResultConfiguration")
if config is not None:
wg_s3_output = config.get("OutputLocation")
encrypt_config: dict[str, str] | None = config.get("EncryptionConfiguration")
wg_encryption = None if encrypt_config is None else encrypt_config.get("EncryptionOption")
wg_kms_key = None if encrypt_config is None else encrypt_config.get("KmsKey")
wg_config: _WorkGroupConfig = _WorkGroupConfig(
enforced=enforced, s3_output=wg_s3_output, encryption=wg_encryption, kms_key=wg_kms_key
)
_logger.debug("Workgroup config:\n%s", wg_config)
return wg_config