Commits


Rossi Sun authored and GitHub committed 136ad9a166b
GH-45551: [C++][Acero] Release temp states of Swiss join building hash table to reduce memory consumption (#45552) ### Rationale for this change #45551 describes the basic idea. Some profiling from real cases follows. Take https://github.com/apache/arrow/blob/a53a77c93217399c4fda8c6328db2c492a30b0b0/cpp/src/arrow/acero/hash_join_node_test.cc#L3368 and print the memory pool stats at the end. Before this change: ``` heap stats: peak total freed current unit count reserved: 22.6 GiB 30.3 GiB 8.5 GiB 21.8 GiB not all freed! committed: 22.9 GiB 30.6 GiB 8.4 GiB 22.1 GiB not all freed! ``` After this change: ``` heap stats: peak total freed current unit count reserved: 17.5 GiB 30.3 GiB 16.0 GiB 14.3 GiB not all freed! committed: 17.8 GiB 30.5 GiB 15.8 GiB 14.7 GiB not all freed! ``` The peak memory is reduced from `22.9GB` to `17.8GB`. Though the reduction is really case-by-case, IMO this could be considered a good improvement for most general cases at zero cost. ### What changes are included in this PR? Make `hash_table_build_`, which only holds temporary states for building the final hash table, transient. And release it (via pointer) as early as possible. ### Are these changes tested? Existing tests should suffice. ### Are there any user-facing changes? None. Except that the user will see good reduction on peak memory usage :) * GitHub Issue: #45551 Authored-by: Rossi Sun <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>