(0.12.4 cherry-pick): aws: support copies >5GB #562

james-rms · 2025-12-04T12:21:51Z

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

See apache#339. The old name is kept as a deprecated alias for now.

This used to work once but got broken during refactoring. Added a test to catch regressions. See apache#48 where -- I think -- we decided that instead of printing entire error chains in `Display` the API users shall walk the error chain. This is especially relevant for error type that we do NOT control, like upstream `request` types.

…pache#412)

This code is what HTTP uses to indicate the client has sent too many requests in a given amount of time. It may sound counter-intuitive to retry in this situation, but that's why we use an exponential backoff mechanism. It gives the server the opportunity to recover, without failing the requests immediately. The retry mechanism already works in object stores like S3 because they return a server error. But without this change, we are not handling GCS properly. GCS returns a client error `429 Too Many Requests` instead. This change enables retries on this response too. A more advanced retry mechanism would use the optional response header `Retry-After`, but that is beyond the scope of this PR. Closes: apache#309

* fix: update links in release docs and script * Remove another reference to the previous crate setup

* Add (failing) test for retrying connection errors * Fix not retrying connection errors closes apache#368 * Fix clippy error --------- Co-authored-by: John Garland <john.garland@vivcourt.com>

Re-export `HeaderMap`, `HeaderValue`, and `Extensions` from http crate to avoid forcing users to add http dependency when using object_store public API. Fixes apache#263

…32 (apache#468) * chore: fix some clippy 1.89 warnings * fix another warning * Skip some doctests for wasm32

Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 5. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v4...v5) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…apache#467)

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v5...v6) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [actions/setup-node](https://github.com/actions/setup-node) from 4 to 5. - [Release notes](https://github.com/actions/setup-node/releases) - [Commits](actions/setup-node@v4...v5) --- updated-dependencies: - dependency-name: actions/setup-node dependency-version: '5' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Bumps [actions/github-script](https://github.com/actions/github-script) from 7 to 8. - [Release notes](https://github.com/actions/github-script/releases) - [Commits](actions/github-script@v7...v8) --- updated-dependencies: - dependency-name: actions/github-script dependency-version: '8' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…ache#487) On a request retry, it logs an info message stating that an error was encountered and information about the retry process but it hasn't included any details about the error that is causing the retry. This PR updates the logging to include the status if it is a server error and the http error kind if a transport error occurred. While the last error when retries are exhausted is returned up the call stack, the intermediate errors need not be exactly the same. It is helpful to include some minimum information about what error triggered a retry each time it happens.

…he#436) These log messages are very noisy.

* Add storage class for aws and gcp * Add azure storage class attribute * Update attribute docs * Update http client

…guration (apache#480) * Allow setting STS endpoint via env var * Properly use AmazonS3Builder::credentials_from_env for AssumeRoleWithWebIdentity auth flow --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

…#493) This reverts commit 034733f.

* Update version to 0.12.4 * Update update_changelog.sh script * Update changelog * Last touchups * Update changelog

src/aws/mod.rs

Muon · 2025-12-04T23:42:32Z

src/aws/builder.rs

+        let multipart_copy_threshold = self
+            .multipart_copy_threshold
+            .map(|val| val.get())
+            .transpose()?
+            .unwrap_or(MAX_SINGLE_REQUEST_COPY_SIZE);
+        let multipart_copy_part_size = self
+            .multipart_copy_part_size
+            .map(|val| val.get())
+            .transpose()?
+            .unwrap_or(MAX_SINGLE_REQUEST_COPY_SIZE);


Clamp to 5GiB because that's the documented maximum?

I think if someone wants to push it over 5GB, they should be able to. There are many "s3-compatible" object stores that might not share the same limitations.

Muon · 2025-12-04T23:44:47Z

src/aws/mod.rs

+        // Determine source size to decide between single CopyObject and multipart copy
+        let head_meta = self
+            .client
+            .get_opts(
+                from,
+                GetOptions {
+                    head: true,
+                    ..Default::default()
+                },
+            )
+            .await?
+            .meta;


Should this first try to use CopyObject and then fall back if it fails due to size?

Good point - let me see if that's straightforward.

The issue with that approach is that on error, AWS does not respond with anything more specific than "InvalidRequest":

<Error><Code>InvalidRequest</Code><Message>The specified copy source is larger than the maximum allowable size for a copy source: 5368709120</Message><RequestId>8550KAYYHRYF33SM</RequestId><HostId>R7zaiPWt96z/yQm2PtDT+pyFmYF76YCBcW0AeukdrXpS4qlSuO1nmXTFI4Ak2YcHMsBoymw33j4=</HostId></Error>

So there's not really a stable API for determining that the request is invalid because of the size of the source.

src/aws/mod.rs

crepererum and others added 30 commits June 17, 2025 12:54

refactor: remove AWS dynamo integration (apache#407)

034733f

Closes apache#373.

refactor: PutMultiPartOpts => PutMultiPartOptions (apache#406)

40d30c6

See apache#339. The old name is kept as a deprecated alias for now.

chore: fix clippy 1.88 warnings (apache#418)

8435bec

chore: update quick-xml to version 0.38.0 (apache#417)

bbfdd70

Prevent compilation error with all cloud features but fs turned on (a…

e284d2c

…pache#412)

fix(gcp): don't panic if read pem fails (apache#421)

83235d1

feat: retry on 429 and 408 (apache#426)

0478be4

Minor: Update release schedule on README (apache#429)

b5001d4

feat (azure): support for account in 'az://' URLs (apache#403)

87f0b8d

chore: prepare 0.12.3 release (apache#437)

dbdef73

fix: update links in release docs and script (apache#440)

cd93197

* fix: update links in release docs and script * Remove another reference to the previous crate setup

Remove use of deprecated StepRng from tests (apache#449)

00e0df5

Fix not retrying connection errors (apache#445)

1e6c78e

* Add (failing) test for retrying connection errors * Fix not retrying connection errors closes apache#368 * Fix clippy error --------- Co-authored-by: John Garland <john.garland@vivcourt.com>

Dont unwrap on body send (apache#442)

c5ec5e5

feat: re-export HTTP types used in public API (apache#441)

8a7bc6e

Re-export `HeaderMap`, `HeaderValue`, and `Extensions` from http crate to avoid forcing users to add http dependency when using object_store public API. Fixes apache#263

Improve documentation for http client timeout (apache#390)

94c25d2

chore: fix some clippy 1.89 warnings and ignore some doctests on wasm…

64cbe73

…32 (apache#468) * chore: fix some clippy 1.89 warnings * fix another warning * Skip some doctests for wasm32

Allow "application_credentials" in impl FromStr for GoogleConfigKey (…

06d02d5

…apache#467)

aws: downgrade credential provider info! log messages to debug! (apac…

f73c457

…he#436) These log messages are very noisy.

Add storage class for aws, gcp, and azure (apache#456)

da88a75

* Add storage class for aws and gcp * Add azure storage class attribute * Update attribute docs * Update http client

Add version 0.12.4 release plan to README (apache#490)

ed17e12

Fix for clippy 1.90 (apache#492)

f1dd075

alamb added 2 commits September 19, 2025 10:38

Revert "refactor: remove AWS dynamo integration (apache#407)" (apache…

cac4bac

…#493) This reverts commit 034733f.

Update version to 0.12.4 and add changelog (apache#491)

9dc8d7d

* Update version to 0.12.4 * Update update_changelog.sh script * Update changelog * Last touchups * Update changelog

Muon reviewed Dec 4, 2025

View reviewed changes

src/aws/mod.rs Show resolved Hide resolved

Muon reviewed Dec 4, 2025

View reviewed changes

Muon reviewed Dec 5, 2025

View reviewed changes

src/aws/mod.rs Outdated Show resolved Hide resolved

aws: support copies >5GB

68b24de

james-rms force-pushed the jrms/0.12.4-aws-copy-etc branch from 5608942 to 68b24de Compare December 5, 2025 05:10

james-rms changed the base branch from main to release/0.12 December 5, 2025 05:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

(0.12.4 cherry-pick): aws: support copies >5GB #562

(0.12.4 cherry-pick): aws: support copies >5GB #562

Uh oh!

james-rms commented Dec 4, 2025

Uh oh!

Uh oh!

Muon Dec 4, 2025

Uh oh!

james-rms Dec 5, 2025

Uh oh!

Muon Dec 4, 2025

Uh oh!

james-rms Dec 5, 2025

Uh oh!

james-rms Dec 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

(0.12.4 cherry-pick): aws: support copies >5GB #562

Are you sure you want to change the base?

(0.12.4 cherry-pick): aws: support copies >5GB #562

Uh oh!

Conversation

james-rms commented Dec 4, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

Uh oh!

Uh oh!

Muon Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

james-rms Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Muon Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

james-rms Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

james-rms Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants