Commits


Gang Wu authored and GitHub committed 7c764c3ef55
GH-34326: [C++][Parquet] Page null_count is incorrect if stats is disabled (#34327) ### Rationale for this change Parquet ColumnWriter obtains null_count of a page from page stats as below ([link](https://github.com/apache/arrow/blob/main/cpp/src/parquet/column_writer.cc#L952)) ```cpp EncodedStatistics page_stats = GetPageStatistics(); int32_t null_count = static_cast<int32_t>(page_stats.null_count); DataPageV2 page(combined, num_values, null_count, num_rows, encoding_, def_levels_byte_length, rep_levels_byte_length, uncompressed_size, pager_->has_compressor(), page_stats); ``` However, the null_count is uninitialized if page stat is not enabled: ```cpp EncodedStatistics GetPageStatistics() override { EncodedStatistics result; if (page_statistics_) result = page_statistics_->Encode(); return result; } ``` ### What changes are included in this PR? ColumnWriter collects null_count by itself. To be safe, it also checks that from page stats if available. ### Are these changes tested? Added a test case to cover null counts of optional and repeated fields are properly set. ### Are there any user-facing changes? No. * Closes: #34326 Authored-by: Gang Wu <ustcwg@gmail.com> Signed-off-by: Weston Pace <weston.pace@gmail.com>