Commits


Adam Reeve authored and GitHub committed 33ed1dbbf79
GH-25986: [C++] Enable external material and rotation for encryption keys (#34181) This PR is a replacement for #10491 which appears to be abandoned. I've fixed the issues pointed out in review comments as well as a few more things I noticed. I've also made one significant change to the API; rather than `CryptoFactory::RotateMasterKeys` operating on a whole directory, it rotates keys for a single file, as this is much more flexible and allows users to decide what files to rotate keys for, and there didn't appear to be a good reason for this to work at the whole directory level. This does mean the API diverges slightly from the parquet-mr API though. ### Rationale for this change Use of external key material allows rotating master encryption keys without having to rewrite Parquet file data. See https://docs.google.com/document/d/1bEu903840yb95k9q2X-BlsYKuXoygE4VnMDl9xz_zhk/edit?usp=sharing for more details. ### What changes are included in this PR? Adds support for writing and reading external key material for Parquet files from C++, as well as a new `CryptoFactory::RotateMasterKeys` function that allows re-encrypting key encryption keys or data encryption keys with latest versions of master keys. ### Are these changes tested? Yes, unit tests are included. I've added an additional test that reads a file generated with parquet-mr (used by Spark) from the parquet-testing repository. This requires merging the PR at https://github.com/apache/parquet-testing/pull/36 and updating the parquet-testing submodule before the new test will pass. ### Are there any user-facing changes? Yes, the existing `internal_key_material` option in `parquet::encryption::EncryptionConfiguration` will now work and use external key material. This requires using two new parameters (`file_path` and `file_system`) in `CryptoFactory::GetFileEncryptionProperties` and `CryptoFactory::GetFileDecryptionProperties`, which are needed so that we know where to write/read the external key material. Note that this means external key material won't work from Python until the new parameters are exposed in Python too. This changes the `CryptoFactory` ABI but the API is still source compatible. Is this okay or should new overloads be added that take the new parameters? The `CryptoFactory::RotateMasterKeys` function is also a new public facing API. * Closes: #25986 Lead-authored-by: Adam Reeve <adreeve@gmail.com> Co-authored-by: Revital Sur <eres@il.ibm.com> Co-authored-by: Maya Anderson <mayaa@il.ibm.com> Signed-off-by: Will Jones <willjones127@gmail.com>