Commits

Wes McKinney authored ab908cc0486
ARROW-6411: [Python][Parquet] Improve performance of DictEncoder::PutIndices I don't really understand why this is faster, though. before ``` ---------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------- BM_ArrowBinaryDict/EncodeDictDirectInt8/1048576 7334087 ns 7333876 ns 98 136.354M items/s BM_ArrowBinaryDict/EncodeDictDirectInt16/1048576 7022430 ns 7022412 ns 100 142.401M items/s BM_ArrowBinaryDict/EncodeDictDirectInt32/1048576 7061033 ns 7060870 ns 99 141.626M items/s BM_ArrowBinaryDict/EncodeDictDirectInt64/1048576 7084581 ns 7084398 ns 97 141.155M items/s ``` after ``` ---------------------------------------------------------------------------------------- Benchmark Time CPU Iterations ---------------------------------------------------------------------------------------- BM_ArrowBinaryDict/EncodeDictDirectInt8/1048576 4387151 ns 4387175 ns 156 227.937M items/s BM_ArrowBinaryDict/EncodeDictDirectInt16/1048576 4446167 ns 4446074 ns 159 224.918M items/s BM_ArrowBinaryDict/EncodeDictDirectInt32/1048576 4501028 ns 4500934 ns 156 222.176M items/s BM_ArrowBinaryDict/EncodeDictDirectInt64/1048576 4635792 ns 4635728 ns 150 215.716M items/s ``` On an i9-9960X CPU before these changes perf reported that `__memmove_avx_unaligned_erms` was taking up a lot of time. In principle `std::vector::reserve` should be correct since memory is not initialized, but something weird seems to be going wrong. If anyone has any ideas I'm interested to learn more. In any case I'll stick with the empirical benchmark evidence on this I started to refactor to use `TypedBufferBuilder<int32_t>` but I'm not sure about the performance of that for scalar appends vs. `std::vector` so I'll leave that for future experimentation. Closes #5248 from wesm/ARROW-6411 and squashes the following commits: b1159ec8a <Wes McKinney> Add C++ benchmarks for DictEncoder<T>::PutIndices da8cc9d79 <Wes McKinney> Add C++ benchmarks 5a73bf509 <Wes McKinney> Add Python benchmark Authored-by: Wes McKinney <wesm+git@apache.org> Signed-off-by: Wes McKinney <wesm+git@apache.org>