Commits


Jinpeng authored and GitHub committed 7ad300390e8
PARQUET-2323: [C++] Use bitmap to store pre-buffered column chunks (#36649) ### Rationale for this change In https://issues.apache.org/jira/browse/PARQUET-2316 we allow partial buffer in parquet File Reader by storing prebuffered column chunk index in a hash set, and make a copy of this hash set for each rowgroup reader. In extreme conditions where numerous columns are prebuffered and multiple rowgroup readers are created for the same row group , the hash set would incur significant overhead. Using a bitmap instead (with one bit per column chunk indicating whether it's prebuffered or not) would be a reasonsable mitigation, taking 4KB for 32K columns. ### What changes are included in this PR? Switch from a hash set to a bitmap buffer. ### Are these changes tested? Yes, passed unit tests on partial prebuffer. ### Are there any user-facing changes? No. Lead-authored-by: jp0317 <zjpzlz@gmail.com> Co-authored-by: Jinpeng <zjpzlz@163.com> Co-authored-by: Gang Wu <ustcwg@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>