Commits


Oliver Layer authored and GitHub committed 9e6acbe08a0
GH-40557: [C++] Use `PutObject` request for S3 in OutputStream when only uploading small data (#41564) ### Rationale for this change See #40557. The previous implementation would always issue multi part uploads which come with 3x RTT to S3 instead of just 1x RTT with a `PutObject` request. ### What changes are included in this PR? Implement logic in the S3 `OutputStream` to use a `PutObject` request if data is below a certain threshold (5 MB) and the output stream is closed. If more data is written, a multi part upload is triggered. Note: Previously, opening the output stream was already expensive because the `CreateMultipartUpload` request was triggered then. With this change opening the output stream becomes cheap, as we rather wait until some data is written to decide which upload method to use. This required some more state-keeping in the output stream class. ### Are these changes tested? No new tests were added, as there are already tests for very small writes and very large writes, which will trigger both ways of uploading. Everything should therefore be covered by existing tests. ### Are there any user-facing changes? - Previously, we would fail when opening the output stream if the bucket doesn't exist. We inferred that by sending the `CreateMultipartUpload` request, which we now do not send anymore upon opening the stream. We now rather fail at closing, or at writing (when >5MB have accumulated). Replicating the old behavior is not possible without sending another request which defeats the purpose of this performance optimization. I hope this is fine. * GitHub Issue: #40557 Lead-authored-by: Oliver Layer <o.layer@celonis.de> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>