Commits


mwish authored and GitHub committed 5a618490f94
GH-33115: [C++] Parquet Implement crc in reading and writing Page for DATA_PAGE (v1) (#14351) This patch add crc in writing and reading DATA_PAGE. And crc for dictionary, DATA_PAGE_V2 will be added in comming patches. * [x] Implement crc in writing DATA_PAGE * [x] Implement crc in reading DATA_PAGE * [x] Adding config for write crc page and checking * [x] Testing DATA_PAGE with crc, the testing maybe borrowed from `parquet-mr` * [x] Using crc library in https://issues.apache.org/jira/browse/ARROW-17904 And there is some questions, I found that in thirdparty, arrow imports `crc32c`, which is extracted from leveldb's crc library. But seems that our standard uses crc32, which has a different magic number. So I vendor implementions mentioned in https://issues.apache.org/jira/browse/ARROW-17904 . The default config of `enable crc` in parquet-mr for writer is true, but here I use `false`, because set it true may slow down writer. * Closes: #33115 Authored-by: mwish <maplewish117@gmail.com> Signed-off-by: Will Jones <willjones127@gmail.com>