Skip to content

Athena GetWorkgroup is called unnecessarily - there should be an option to avoid that #3233

@inbar-beehero

Description

@inbar-beehero

Is your idea related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

My issue is I get a lot of GetWorkgroup throttling in my account, leading to eventual failures in some of my services. This happens because lambdas using wrangler are invoked in high-frequency, and invoke GetWorkgroup as part of the flow for any and all Athena queries.

In aws-sdk-pandas's Athena code, _get_workgroup_config‎ is regularly invoked before _start_query_execution, and the resulting _WorkGroupConfig is (exclusively as far as I can see) used to override a few client_athena.start_query_execution arguments, namely the output location (S3 bucket), the encryption and the KMS. However, this is redundant - if we omit the explicit setting of these arguments, the workgroup settings in AWS will be used (according to https://docs.aws.amazon.com/athena/latest/APIReference/API_StartQueryExecution.html#API_StartQueryExecution_RequestSyntax and the ResultConfiguration documentation). In fact, currently this code does nothing - it calls GetWorkgroup in order to 'override' the (server) workgroup definitions with (previusly obtained) workgroup definitions.

Describe the solution you'd like
A clear and concise description of what you are expecting.

I suggest one of 2 options:

  1. If this function is indeed redundant (possibly a leftover from a time where AWS Athena workgroup did not include some of these definitions), it can be removed entirely
  2. If I missed the use case of this function, I would like an option to configure wrangler to not use it. I think in most cases these definitions should be set in the server, and not in the client, so unless you have a special case where you want to override the server set up, it should be avoided. If you do want to override it, you probably don't want to do it implicitly with whatever is defined on the workgroup, so in this scenario it needs to be explicitly set

P.S. Please do not attach files as it's considered a security risk. Add code snippets directly in the message body as much as possible.

Relevant code:
The use of the workgroup config:

def _start_query_execution(...
...
    # s3_output
    args["ResultConfiguration"] = {
        "OutputLocation": _get_s3_output(s3_output=s3_output, wg_config=wg_config, boto3_session=boto3_session)
    }

    # encryption
    if wg_config.enforced is True:
        if wg_config.encryption is not None:
            args["ResultConfiguration"]["EncryptionConfiguration"] = {"EncryptionOption": wg_config.encryption}
            if wg_config.kms_key is not None:
                args["ResultConfiguration"]["EncryptionConfiguration"]["KmsKey"] = wg_config.kms_key

The get workgroup code:

def _get_workgroup_config(session: boto3.Session | None = None, workgroup: str = "primary") -> _WorkGroupConfig:
    enforced: bool
    wg_s3_output: str | None
    wg_encryption: str | None
    wg_kms_key: str | None

    enforced, wg_s3_output, wg_encryption, wg_kms_key = False, None, None, None
    if workgroup is not None:
        res = get_work_group(workgroup=workgroup, boto3_session=session)
        enforced = res["WorkGroup"]["Configuration"]["EnforceWorkGroupConfiguration"]
        config: dict[str, Any] = res["WorkGroup"]["Configuration"].get("ResultConfiguration")
        if config is not None:
            wg_s3_output = config.get("OutputLocation")
            encrypt_config: dict[str, str] | None = config.get("EncryptionConfiguration")
            wg_encryption = None if encrypt_config is None else encrypt_config.get("EncryptionOption")
            wg_kms_key = None if encrypt_config is None else encrypt_config.get("KmsKey")
    wg_config: _WorkGroupConfig = _WorkGroupConfig(
        enforced=enforced, s3_output=wg_s3_output, encryption=wg_encryption, kms_key=wg_kms_key
    )
    _logger.debug("Workgroup config:\n%s", wg_config)
    return wg_config

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions