Commits


Maya Anderson authored and Antoine Pitrou committed 88bccab18a4
ARROW-14114: [C++][Parquet] Fix multi-threaded read of PME files Change AesDecryptor to be per Decryptor, instead of shared. This solves the problem of reading with PME using multiple threads. Details: It was discovered when exposing high-level PME in PyArrow that reading an encrypted parquet file in PyArrow intermittently fails decryption finalization and sometime fails with Segmentation fault. The same in C++ reading an encrypted parquet with FileReader.ReadTable() multithreaded (with set_use_threads(true) ). The current implementation uses two caches: meta_decryptor_ and data_decryptor_ , for AesDecryptors, and every Decryptor gets the same AesDecryptor with AesDecryptorImpl from this cache. However, AesDecryptor::AesDecryptorImpl::GcmDecrypt() and AesDecryptor::AesDecryptorImpl::CtrDecrypt() use ctx_ member of type EVP_CIPHER_CTX from OpenSSL, which shouldn't be used from multiple threads concurrently. So, instead of sharing the same AesDecryptor between all Decryptors, an AesDecryptor will be created per Decryptor, which is per column. Co-authored-by: Gidon Gershinsky <ggershinsky@apple.com> CC @thamht4190 @pitrou @revit13 Closes #12778 from andersonm-ibm/multithreaded_read Lead-authored-by: Maya Anderson <mayaa@il.ibm.com> Co-authored-by: Gidon Gershinsky <ggershinsky@apple.com> Signed-off-by: Antoine Pitrou <antoine@python.org>