Commits


Wes McKinney authored and Antoine Pitrou committed 3fa47f24190
ARROW-8928: [C++] Add microbenchmarks to help measure ExecBatchIterator overhead These are only preliminary benchmarks but may help in examining microperformance overhead related to `ExecBatch` and its implementation (as a `vector<Datum>`). It may be desirable to devise an "array reference" data structure with few or no heap-allocated data structures and no `shared_ptr` interactions required to obtain memory addresses and other array information. On my test machine (macOS i9-9880H 2.3ghz), I see about 472 CPU cycles per field overhead for each ExecBatch produced. These benchmarks take a record batch with 1M rows and 10 columns/fields and iterates through the rows in smaller ExecBatches of the indicated sizes ``` BM_ExecBatchIterator/256 8207877 ns 8204914 ns 81 items_per_second=121.878/s BM_ExecBatchIterator/512 4421049 ns 4419958 ns 166 items_per_second=226.247/s BM_ExecBatchIterator/1024 2056636 ns 2055369 ns 333 items_per_second=486.531/s BM_ExecBatchIterator/2048 1056415 ns 1056264 ns 682 items_per_second=946.733/s BM_ExecBatchIterator/4096 514276 ns 514136 ns 1246 items_per_second=1.94501k/s BM_ExecBatchIterator/8192 262539 ns 262391 ns 2736 items_per_second=3.81111k/s BM_ExecBatchIterator/16384 128995 ns 128974 ns 5398 items_per_second=7.75351k/s BM_ExecBatchIterator/32768 64987 ns 64970 ns 10811 items_per_second=15.3917k/s ``` So for the 1024 case, it takes 2,055,369 ns to iterate through all 1024 batches. That seems a bit expensive to me (?) — I suspect we can do better while also improving compilation times and reducing generated code size by using simpler data structures in our compute internals. Closes #9280 from wesm/cpp-compute-microbenchmarks Lead-authored-by: Wes McKinney <wesm@apache.org> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>