Commits


Donald Tolley authored and GitHub committed dd6d7288e41
GH-39444: [C++][Parquet] Fix crash in Modular Encryption (#39623) **Rationale for this change:** This pull request addresses a critical issue (GH-39444) in the C++/Python components of Parquet, specifically a segmentation fault occurring when processing encrypted datasets over 2^15 rows. The fix involves modifications in `cpp/src/parquet/encryption/internal_file_decryptor.cc`, particularly in `InternalFileDecryptor::GetColumnDecryptor`. The caching of the `Decryptor` object was removed to resolve the multithreading issue causing the segmentation fault and encryption failures. **What changes are included in this PR?** - Removal of `Decryptor` object caching in `InternalFileDecryptor::GetColumnDecryptor`. - Addition of two unit tests: `large_row_parquet_encrypt_test.cc` for C++ and an update to `test_dataset_encryption.py` with `test_large_row_encryption_decryption` for Python. **Are these changes tested?** Yes, the unit tests (`large_row_parquet_encrypt_test.cc` and `test_large_row_encryption_decryption` in `test_dataset_encryption.py`) have been added to ensure the reliability and effectiveness of these changes. **Are there any user-facing changes?** No significant user-facing changes, but the update significantly improves the backend stability and reliability of Parquet file handling. Calling DecryptionKeyRetriever::GetKey could be an expensive operation potentially involving network calls to key management servers. * Closes: #39444 * GitHub Issue: #39444 Lead-authored-by: Donald Tolley <tolleybot@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Co-authored-by: Adam Reeve <adreeve@gmail.com> Co-authored-by: Gang Wu <ustcwg@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>