Commits


Donald Tolley authored and GitHub committed 0793432ad0e
GH-29238 [C++][Dataset][Parquet] Support parquet modular encryption in the new Dataset API (#34616) ### Rationale for this change The purpose of this pull request is to support modular encryption in the new Dataset API. See [https://docs.google.com/document/d/13EysCNC6-Nu9wnJ8YpdzmD-aMLn4i2KXUJTNqIihy7A/edit#](url) for supporting document. ### What changes are included in this PR? I made improvements to the C++ and Python code to enable the Dataset API to have per-file settings for each file saved. Previously, the Dataset API applied the same encryption properties to all saved files, but now I've updated the code to allow for greater flexibility. In the Python code, I've added support for the changes by updating the ParquetFormat class to accept DatasetEncryptionConfiguration and DatasetDecryptionConfiguration structures. With these changes, you can pass the format object to the write_dataset function, giving you the ability to set unique encryption properties for each file in your Dataset. ### Are these changes tested? Yes, unit tests are included. I have also included a python sample project. ### Are there any user-facing changes? Yes, as stated above the ParquetFormat class has optional parameters for DatasetEncryptionConfiguration and DatasetDecryptionConfiguration through setters and getters. The Dataset now has the option using this to set different file encryption properties per file * Closes: #29238 Lead-authored-by: Don <tolleybot@gmail.com> Co-authored-by: Donald Tolley <tolleybot@gmail.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com> Co-authored-by: anjakefala <anja@voltrondata.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Co-authored-by: Weston Pace <weston.pace@gmail.com> Co-authored-by: Gang Wu <ustcwg@gmail.com> Co-authored-by: scoder <stefan_ml@behnel.de> Co-authored-by: Will Jones <willjones127@gmail.com> Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>