Skip to content

Fix multipart download response metadata for presigned URL and normal paths#7077

Open
jencymaryjoseph wants to merge 7 commits into
feature/master/pre-signed-url-getobjectfrom
jencyjos/presignedurl/multipart-download-metadata
Open

Fix multipart download response metadata for presigned URL and normal paths#7077
jencymaryjoseph wants to merge 7 commits into
feature/master/pre-signed-url-getobjectfrom
jencyjos/presignedurl/multipart-download-metadata

Conversation

@jencymaryjoseph

@jencymaryjoseph jencymaryjoseph commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Motivation and Context

When the S3 multipart async client downloads a large object via a presigned URL (using ranged GETs for each part), the response metadata exposed to the customer reflects only the first part — not the full object. Customers see an incorrect contentLength (part size instead of total), a partial contentRange, and meaningless per-part composite checksum values.
This fix rewrites the first part's response into a full-object response before it reaches the customer, across both presigned download paths (parallel toFile and serial toBytes/custom transformers).

Modifications

The fix has two prongs, because the two download architectures hand off the response in different places:

  • Parallel path (toFile) — a subscriber manages all parts concurrently and owns the result future. We rewrite the response just before completing the future.
    ParallelPresignedUrlMultipartDownloaderSubscriber calls toFullObjectResponse(...) before resultFuture.complete(...).
  • Serial path (toBytes, custom transformers) — parts flow one at a time through the splitting infrastructure, which calls the customer's onResponse() with the first part's response. We inject a responseMapper so the rewrite happens at the onResponse() delivery point, before the customer sees it.
    PresignedUrlDownloadHelper uses splitWithResponseRewrite(...).

Both prongs use the same MultipartDownloadUtils.toFullObjectResponse() to do the rewrite:

  • contentLength → total object size (parsed from Content-Range)
  • contentRange → bytes 0-(total-1)/total
  • all checksum value fields → null when checksumType is COMPOSITE (composite checksums are per-part hashes, not valid for the whole object)
  • all other fields (etag, versionId, etc.) are preserved via toBuilder()

Response-mapper injection (sdk-core, shared infrastructure). Rather than adding a new split method, responseMapper is carried on the existing public SplittingTransformerConfiguration. The default split(config) reads it and threads it into SplittingTransformer / ByteArraySplittingTransformer, which apply it at their onResponse boundary (defaulting to identity when unset). This keeps the public API surface to a single optional config setter — no new method, no instanceof branching — and every transformer (including FileAsyncResponseTransformer) is handled through its own existing split(config).

Presigned 416 fix. Broadened the empty-object fallback catch to also match a raw S3Exception with status 416. The serial path surfaces the raw exception directly (unwrapped), so the original catch on EmptyObjectRangeNotSatisfiableException alone never matched for custom transformers, skipping the fallback. As a follow-up, EmptyObjectRangeNotSatisfiableException will be removed entirely and both paths unified on isRangeNotSatisfiable().

Transfer Manager progress. With the mapper on the config, splitWithResponseRewrite is a normal transformer.split(config) call, so the TM's progress wrapper's own split() runs and counts bytes correctly. No special routing is needed in GenericS3TransferManager.

Testing

Unit tests

  • MultipartDownloadUtilsTest — toFullObjectResponse(): content-length/range rewrite, checksum nulling for COMPOSITE, preservation for FULL_OBJECT, no-op when Content-Range is absent.
  • SplittingTransformerConfigurationTest — config carries the response mapper.

WireMock tests

  • PresignedUrlMultipartDownloadResponseMetadataWiremockTest — toFile and toBytes see full-object metadata.
  • PresignedUrlMultipartDownloaderSubscriberWiremockTest — 416 fallback works for custom serial transformers (fails without the fix).
  • S3TransferManagerPresignedUrlListenerWiremockTest — transferInitiated / bytesTransferred / transferFailed fire correctly for presigned downloads across multipart/non-multipart × toFile/toBytes × ranged/non-ranged.

Integration tests

  • AsyncPresignedUrlExtensionTestSuite — presigned toBytes/toFile metadata assertions, MPU-with-checksum (COMPOSITE) handling.

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Checklist

  • I have read the CONTRIBUTING document
  • Local run of mvn install succeeds
  • My code follows the code style of this project
  • My change requires a change to the Javadoc documentation
  • I have updated the Javadoc documentation accordingly
  • I have added tests to cover my changes
  • All new and existing tests passed
  • I have added a changelog entry. Adding a new entry must be accomplished by running the scripts/new-change script and following the instructions. Commit the new file created by the script in .changes/next-release with your changes.
  • My change is to implement 1.11 parity feature and I have updated LaunchChangelog

License

  • I confirm that this pull request can be released under the Apache 2 license

@jencymaryjoseph jencymaryjoseph requested a review from a team as a code owner June 25, 2026 17:50
}

@Test
void multipartDownload_checksumModeEnabled_hasCorrectFullObjectMetadata() throws Exception {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test necessary?

}

@Test
void multipartDownload_toBytes_smallObject_hasCorrectFullObjectMetadata() throws Exception {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consolidate this with multipartDownload_toFile_hasCorrectFullObjectMetadata using parameterized tests?

}

@Test
void getObject_withRangeRequest_preservesPartialMetadata() throws Exception {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, let's try to consolidate tests with parameterized tests

}

@Test
void getObject_mpuObjectWithChecksumMode_hasCorrectMetadata() throws Exception {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. How is checksum mode special?

}

// Helper methods
private static void uploadMpuObjectWithChecksum() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checksum should be enabled by default, any reason we need to upload it with checksum?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an object is MPU without checksumMode enabled, S3 doesnt return checksum.
If an object is MPU with checksumMode enabled, S3 doesnt returns FULL_OBJECT checksum.
And if uploaded with checksum enabled and with an explicit checksum algorithm like .checksumAlgorithm(ChecksumAlgorithm.CRC32)) S3 returns COMPOSITE checksum.

Comment on lines +179 to +182
if (transformer instanceof ByteArrayAsyncResponseTransformer) {
return (SplitResult<GetObjectResponse, T>)
((ByteArrayAsyncResponseTransformer<GetObjectResponse>) transformer).split(splitConfig, mapper);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we have special logic for ByteArrayAsyncResponseTransformer? ByteArrayAsyncResponseTransformer is an internal API and not supposed to be used across modules

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, removed the instanceof and added split(config, mapper) to the AsyncResponseTransformer interface (with ByteArrayAsyncResponseTransformer overriding it). splitWithResponseRewrite() now just calls transformer.split(splitConfig, mapper)

if (cause instanceof EmptyObjectRangeNotSatisfiableException) {
// Parallel path wraps it as EmptyObjectRangeNotSatisfiableException;
// serial path (toBytes, custom transformers) surfaces raw S3Exception.
if (cause instanceof EmptyObjectRangeNotSatisfiableException

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: what is EmptyObjectRangeNotSatisfiableException?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EmptyObjectRangeNotSatisfiableException is an internal exception created by the parallel subscriber when it gets a 416 from S3 on a ranged request to an empty object. The serial path doesnt go through the subscriber, so the raw 416 S3Exception arrives without being wrapped. Planning to remove this exception class as a follow up and just use isRangeNotSatisfiable() for all paths.

UnaryOperator.identity());
}

private SplittingTransformer(AsyncResponseTransformer<ResponseT, ResultT> upstreamResponseTransformer,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update this ctor to take a Builder parameter? That way, we don't need to create a new ctor.

@jencymaryjoseph jencymaryjoseph requested a review from zoewangg June 26, 2026 17:04
? progressUpdater.wrapForNonSerialFileDownload(
responseTransformer, GetObjectRequest.builder().build())
: progressUpdater.wrapResponseTransformer(responseTransformer);
if (isS3ClientMultipartEnabled()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixes test failure for bytesTransferred not firing for presigned toBytes multipart downloads.
That path was routed to wrapForNonSerialFileDownload, which only counts bytes inside its split() override, but the serial download splits and drives onStream directly, bypassing it. Now routed by parallelSplitSupported() so serial toBytes uses wrapResponseTransformerForMultipartDownload (counts in onStream), mirroring the regular download path

* Creates a {@link SplitResult} with a response mapper applied at the upstream {@code onResponse} delivery point.
*/
@SdkInternalApi
default SplitResult<ResponseT, ResultT> split(SplittingTransformerConfiguration splitConfig,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO all public methods in a public API class are inherently public APIs, so we can't really add SdkInternalApi. Should we consider folding responseMapper into SplittingTransformerConfiguration. That way, we don't have to introduce another method

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved responseMapper onto SplittingTransformerConfiguration and removed the extra split method

this(upstreamResponseTransformer, resultFuture, UnaryOperator.identity());
}

public ByteArraySplittingTransformer(AsyncResponseTransformer<ResponseT, ResponseBytes<ResponseT>>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to new ctor? can we just add a new parameter?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up, the mapper now comes from the config rather than a separate split(config, mapper),

: progressUpdater.wrapResponseTransformer(responseTransformer);
if (isS3ClientMultipartEnabled()
&& presignedDownloadRequest.presignedUrlDownloadRequest().range() == null) {
if (responseTransformer.split(b -> b.bufferSizeInBytes(1L)).parallelSplitSupported()) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned that invoking responseTransformer.split may have implications, for example, involving a service call (they are harmless in ou implementations today, but we can't guarantee future implementations or custom implementations).

Is there another way?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, removed. With the mapper on the config, the wrapper's own split() handles both the rewrite and byte counting.


private final Map<Integer, ByteBuffer> buffers;

private final UnaryOperator<ResponseT> responseMapper;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: don't we need to update FileAsyncResponseTransfomer as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, file's existing split(config) now reads the mapper from the config like every other transformer, so it's handled automatically

* @return full-object response with total content-length, full content-range,
* and checksum values nulled if checksum type is COMPOSITE
*/
public static GetObjectResponse toFullObjectResponse(GetObjectResponse firstPartResponse) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we include other fields such as etag, version ID etc if they are present?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

firstPartResponse.toBuilder() copies all fields. the function only overrides contentLength/contentRange and for COMPOSITE checksums nulls checksums.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants