-
Notifications
You must be signed in to change notification settings - Fork 108
(0.12.4 cherry-pick): aws: support copies >5GB #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/0.12
Are you sure you want to change the base?
(0.12.4 cherry-pick): aws: support copies >5GB #562
Conversation
See apache#339. The old name is kept as a deprecated alias for now.
This used to work once but got broken during refactoring. Added a test to catch regressions. See apache#48 where -- I think -- we decided that instead of printing entire error chains in `Display` the API users shall walk the error chain. This is especially relevant for error type that we do NOT control, like upstream `request` types.
This code is what HTTP uses to indicate the client has sent too many requests in a given amount of time. It may sound counter-intuitive to retry in this situation, but that's why we use an exponential backoff mechanism. It gives the server the opportunity to recover, without failing the requests immediately. The retry mechanism already works in object stores like S3 because they return a server error. But without this change, we are not handling GCS properly. GCS returns a client error `429 Too Many Requests` instead. This change enables retries on this response too. A more advanced retry mechanism would use the optional response header `Retry-After`, but that is beyond the scope of this PR. Closes: apache#309
* fix: update links in release docs and script * Remove another reference to the previous crate setup
* Add (failing) test for retrying connection errors * Fix not retrying connection errors closes apache#368 * Fix clippy error --------- Co-authored-by: John Garland <john.garland@vivcourt.com>
Re-export `HeaderMap`, `HeaderValue`, and `Extensions` from http crate to avoid forcing users to add http dependency when using object_store public API. Fixes apache#263
…32 (apache#468) * chore: fix some clippy 1.89 warnings * fix another warning * Skip some doctests for wasm32
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v4...v5) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v5...v6) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4 to 5. - [Release notes](https://github.com/actions/setup-node/releases) - [Commits](actions/setup-node@v4...v5) --- updated-dependencies: - dependency-name: actions/setup-node dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 8. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](actions/github-script@v7...v8) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '8' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ache#487) On a request retry, it logs an info message stating that an error was encountered and information about the retry process but it hasn't included any details about the error that is causing the retry. This PR updates the logging to include the status if it is a server error and the http error kind if a transport error occurred. While the last error when retries are exhausted is returned up the call stack, the intermediate errors need not be exactly the same. It is helpful to include some minimum information about what error triggered a retry each time it happens.
…he#436) These log messages are very noisy.
* Add storage class for aws and gcp * Add azure storage class attribute * Update attribute docs * Update http client
…guration (apache#480) * Allow setting STS endpoint via env var * Properly use AmazonS3Builder::credentials_from_env for AssumeRoleWithWebIdentity auth flow --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* Update version to 0.12.4 * Update update_changelog.sh script * Update changelog * Last touchups * Update changelog
| let multipart_copy_threshold = self | ||
| .multipart_copy_threshold | ||
| .map(|val| val.get()) | ||
| .transpose()? | ||
| .unwrap_or(MAX_SINGLE_REQUEST_COPY_SIZE); | ||
| let multipart_copy_part_size = self | ||
| .multipart_copy_part_size | ||
| .map(|val| val.get()) | ||
| .transpose()? | ||
| .unwrap_or(MAX_SINGLE_REQUEST_COPY_SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clamp to 5GiB because that's the documented maximum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if someone wants to push it over 5GB, they should be able to. There are many "s3-compatible" object stores that might not share the same limitations.
| // Determine source size to decide between single CopyObject and multipart copy | ||
| let head_meta = self | ||
| .client | ||
| .get_opts( | ||
| from, | ||
| GetOptions { | ||
| head: true, | ||
| ..Default::default() | ||
| }, | ||
| ) | ||
| .await? | ||
| .meta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this first try to use CopyObject and then fall back if it fails due to size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - let me see if that's straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with that approach is that on error, AWS does not respond with anything more specific than "InvalidRequest":
<Error><Code>InvalidRequest</Code><Message>The specified copy source is larger than the maximum allowable size for a copy source: 5368709120</Message><RequestId>8550KAYYHRYF33SM</RequestId><HostId>R7zaiPWt96z/yQm2PtDT+pyFmYF76YCBcW0AeukdrXpS4qlSuO1nmXTFI4Ak2YcHMsBoymw33j4=</HostId></Error>
So there's not really a stable API for determining that the request is invalid because of the size of the source.
5608942 to
68b24de
Compare
Which issue does this PR close?
Closes #.
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?