Commits


Antoine Pitrou authored and GitHub committed f8a0902cbbf
GH-37630: [C++][Python][Dataset] Allow disabling fragment metadata caching (#45330) ### Rationale for this change Parquet file fragments currently cache their (Parquet) metadata for later accesses when scanning has finished. This can produce surprisingly high memory consumption in cases where: 1. the dataset is only scanned once, rather than repeatedly (this is very common) 2. there is a high metadata-to-data ratio; this can happen when the schemas on disk are very wide, with few rows per file and/or a low number of columns selected for reading ### What changes are included in this PR? Add an option to disable metadata caching on Parquet file fragments. ### Are these changes tested? Yes, by new unit tests. Also, reading a wide dataset locally has been confirmed to consume much less memory when the new option is toggled. ### Are there any user-facing changes? No. * GitHub Issue: #37630 Authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: Antoine Pitrou <antoine@python.org>