Commits

Wes McKinney authored 2164e3b435f
ARROW-4398: [C++][Python][Parquet] Improve BYTE_ARRAY PLAIN encoding write performance. Add BYTE_ARRAY write benchmarks Use BufferBuilder and UnsafeAppend to accelerate writes. Before (prior to ARROW-6381, which came up while investigating this): ``` ---------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------- BM_ArrowBinaryPlain/EncodeArrow/262144 6111690 ns 6109829 ns 465 246.06MB/s BM_ArrowBinaryPlain/EncodeArrow/1048576 30470849 ns 30451048 ns 85 197.296MB/s BM_ArrowBinaryPlain/EncodeLowLevel/262144 5352838 ns 5352679 ns 514 280.866MB/s BM_ArrowBinaryPlain/EncodeLowLevel/1048576 29736017 ns 29735036 ns 94 202.047MB/s ``` After ``` ---------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------- BM_ArrowBinaryPlain/EncodeArrow/262144 2020914 ns 2020905 ns 1000 743.918MB/s BM_ArrowBinaryPlain/EncodeArrow/1048576 11596223 ns 11596094 ns 242 518.096MB/s BM_ArrowBinaryPlain/EncodeLowLevel/262144 2740316 ns 2740256 ns 1021 548.63MB/s BM_ArrowBinaryPlain/EncodeLowLevel/1048576 17562138 ns 17560763 ns 157 342.12MB/s ``` Dictionary encoding perf is not really affected by this work, so this will mostly affect that case where we fall back to PLAIN encoding when the dictionary grows large. Closes #5233 from wesm/ARROW-4398 and squashes the following commits: 3a8c37ac9 <Wes McKinney> Code review comments. Don't box string_view as ByteArray 252b45c4c <Wes McKinney> Fix -Wsign-compare error f77838cac <Wes McKinney> Improve benchmark f7e659d29 <Wes McKinney> Add ASV benchmarks for binary writes 844427ca3 <Wes McKinney> More perf improvements, add benchmark for low level vs high level ab2ad26ef <Wes McKinney> Use unsafe appends for BinaryArray for better performance e6458d2aa <Wes McKinney> Use BufferBuilder in PlainEncoder, use UnsafeAppend to avoid calling Reserve so much 47f10cde7 <Wes McKinney> Add benchmark of direct Arrow write Authored-by: Wes McKinney <wesm+git@apache.org> Signed-off-by: Wes McKinney <wesm+git@apache.org>