Commits


Omega Gamage authored and Wes McKinney committed b4acb0bc7f1
PARQUET-1780: [C++] Set ColumnMetadata.encoding_stats field This is to solve the issue PARQUET-1780: ColumnMetadata.encoding_stats field is empty in parquet-cpp implementation. This leads to metadata mismatches between 2 parquet files generated by cpp and scala(parquet-mr). encoding_stat is a vector of **PageEncodingStats**. PageEncodingStats has three attributes: - page_type: (data or dict) - encoding: encoding of the page - count:number of pages of this type with this encoding From above first to can be extracted from available information. But for count I have to create a add some attributes to exisiting classes. Modifications: For the class **SerializedPageWriter**, added following two attributes. int32_t num_dict_pages_; std::pair<int32_t, int32_t> num_data_pages_; (first: number of un-encoded pages, second:number of encoded pages ) Closes #6370 from omega-gamage/PARQUET-1780 and squashes the following commits: 086af4e8d <Wes McKinney> Code review comments a9c684b25 <Omega Gamage> Match the implementation with impala implementation eae56fa4b <Wes McKinney> Simplify PageEncodingStats 54ac1eb15 <Omega Gamage> commit 9eecaaf5fe895f85d5352ec7420267701d6d6e8f Author: Omega Gamage <omega@bigstream.co> Date: Tue Feb 18 14:23:08 2020 +0530 Lead-authored-by: Omega Gamage <omega@bigstream.co> Co-authored-by: Wes McKinney <wesm+git@apache.org> Signed-off-by: Wes McKinney <wesm+git@apache.org>