Commits


Antoine Pitrou authored and David Li committed 9a5d010556a
ARROW-8692: [C++] Avoid memory copies when downloading from S3 The AWS SDK creates a auto-growing StringStream by default, entailing multiple memory copies when transferring large data blocks (because of resizes). Instead, write directly into the target data area. Low-level benchmarks with a local Minio server: * before: ``` ----------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ----------------------------------------------------------------------------------------------------- MinioFixture/ReadAll500Mib/real_time 434528630 ns 431461370 ns 2 bytes_per_second=1.1237G/s items_per_second=2.30134/s MinioFixture/ReadChunked500Mib/real_time 419380389 ns 339293384 ns 2 bytes_per_second=1.16429G/s items_per_second=2.38447/s MinioFixture/ReadCoalesced500Mib/real_time 258812283 ns 470149 ns 3 bytes_per_second=1.88662G/s items_per_second=3.8638/s ``` * after: ``` MinioFixture/ReadAll500Mib/real_time 194620947 ns 161227337 ns 4 bytes_per_second=2.50888G/s items_per_second=5.13819/s MinioFixture/ReadChunked500Mib/real_time 276437393 ns 183030215 ns 3 bytes_per_second=1.76634G/s items_per_second=3.61746/s MinioFixture/ReadCoalesced500Mib/real_time 86693750 ns 448568 ns 6 bytes_per_second=5.63225G/s items_per_second=11.5349/s ``` Parquet read benchmarks from a local Minio server show speedups from 1.1x to 1.9x. Closes #7098 from pitrou/ARROW-8692-s3-avoid-copies Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: David Li <li.davidm96@gmail.com>