Commits


Anja Kefala authored and GitHub committed b6c4fbc2ce9
GH-40316: [Python] only allocate the ScalarMemoTable when used (#40565) ### Rationale for this change Mimalloc and jemalloc can allocate a [relatively large amount of memory for the ScalarMemoTable](https://github.com/apache/arrow/issues/40301). For this reason, the ScalarMemoTable should only be allocated when it is used (when `options.deduplicate_objects=True`). I tested this change, and for small tables it does improve memory allocation. `options.deduplicate_objects=False` After this change: š¦ Total memory allocated: 174.422MB š Histogram of allocation size: min: 1.000B -------------------------------------------- < 6.000B : 3064 āā < 36.000B : 7533 āāāā < 222.000B : 9974 āāāāā < 1.319KB : 53264 āāāāāāāāāāāāāāāāāāāāāāāāā < 7.999KB : 5188 āāā < 48.503KB : 742 ā < 294.066KB: 102 ā < 1.741MB : 22 ā < 10.556MB : 1 ā <=64.000MB : 1 ā -------------------------------------------- max: 64.000MB Before this change: š¦ Total memory allocated: 1.295GB š Histogram of allocation size: min: 1.000B -------------------------------------------- < 6.000B : 3064 āā < 36.000B : 7543 āāāā < 222.000B : 10009 āāāāā < 1.319KB : 53269 āāāāāāāāāāāāāāāāāāāāāāāāā < 7.999KB : 5192 āāā < 48.503KB : 761 ā < 294.066KB: 102 ā < 1.741MB : 22 ā < 10.556MB : 1 ā <=64.000MB : 19 ā -------------------------------------------- max: 64.000MB ### What changes are included in this PR? The allocation of `memo_table` and `unique_values` have been moved underneath an `if (options.deduplicate_objects)` block. Since they are used within a lambda, they have been changed to shared pointers, so that their values exist for the lifetime needed. ### Are these changes tested? `deduplicate_objects` has extensive existing tests: https://github.com/apache/arrow/blob/b235f83ed10bcad174b267113479a24ca045def5/python/pyarrow/tests/test_pandas.py#L3211 and https://github.com/apache/arrow/blob/b235f83ed10bcad174b267113479a24ca045def5/python/benchmarks/convert_pandas.py#L71 ### Are there any user-facing changes? Nope. * GitHub Issue: #40316 Lead-authored-by: anjakefala <anja.kefala@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>