Public / arrow / 3fa47f24190

Commits

Wes McKinney authored and Antoine Pitrou committed 3fa47f2419003 Aug 2021
ARROW-8928: [C++] Add microbenchmarks to help measure ExecBatchIterator overhead

These are only preliminary benchmarks but may help in examining microperformance overhead related to `ExecBatch` and its implementation (as a `vector<Datum>`).

It may be desirable to devise an "array reference" data structure with few or no heap-allocated data structures and no `shared_ptr` interactions required to obtain memory addresses and other array information.

On my test machine (macOS i9-9880H 2.3ghz), I see about 472 CPU cycles per field overhead for each ExecBatch produced. These benchmarks take a record batch with 1M rows and 10 columns/fields and iterates through the rows in smaller ExecBatches of the indicated sizes

```
BM_ExecBatchIterator/256      8207877 ns      8204914 ns           81 items_per_second=121.878/s
BM_ExecBatchIterator/512      4421049 ns      4419958 ns          166 items_per_second=226.247/s
BM_ExecBatchIterator/1024     2056636 ns      2055369 ns          333 items_per_second=486.531/s
BM_ExecBatchIterator/2048     1056415 ns      1056264 ns          682 items_per_second=946.733/s
BM_ExecBatchIterator/4096      514276 ns       514136 ns         1246 items_per_second=1.94501k/s
BM_ExecBatchIterator/8192      262539 ns       262391 ns         2736 items_per_second=3.81111k/s
BM_ExecBatchIterator/16384     128995 ns       128974 ns         5398 items_per_second=7.75351k/s
BM_ExecBatchIterator/32768      64987 ns        64970 ns        10811 items_per_second=15.3917k/s
```

So for the 1024 case, it takes 2,055,369 ns to iterate through all 1024 batches. That seems a bit expensive to me (?) — I suspect we can do better while also improving compilation times and reducing generated code size by using simpler data structures in our compute internals.

Closes #9280 from wesm/cpp-compute-microbenchmarks

Lead-authored-by: Wes McKinney <wesm@apache.org>
Co-authored-by: Antoine Pitrou <antoine@python.org>
Signed-off-by: Antoine Pitrou <antoine@python.org>