Commits


Thomas Newton authored and GitHub committed 23dfd0e8643
GH-37511: [C++] Implement file reads for Azure filesystem (#38269) ### Rationale for this change We want a C++ implementation of an Azure filesystem. Reading files is the first step. ### What changes are included in this PR? Adds an implementation of `io::RandomAccessFile` for Azure blob storage (with or without hierarchical namespace (HNS) a.k.a datalake gen 2). This is largely copied from https://github.com/apache/arrow/pull/12914. Using this `io::RandomAccessFile` implementation we implement the input file and stream methods of the `AzureFileSystem`. I've made a few changes to the implementation from https://github.com/apache/arrow/pull/12914. The biggest one is removing use of the Azure SDK datalake APIs. These APIs cannot be tested with `azurite`, they are only beneficial for listing operations on HNS enabled accounts and detecting a HNS enabled account is quite difficult (unless you use significantly elevated Azure permissions). Adding 2 different code paths for normal blob storage and datalake gen 2 seems like a bad idea to me except in cases where there is a performance advantage. I also made a few other tweaks to some of the error handling and to make things more consistent with the S3 or GCS filesystems. ### Are these changes tested? Yes. The tests are all based on the tests from the GCS filesystem with minimal chantges. I remember reading a review comment on https://github.com/apache/arrow/pull/12914 which recommended this approach. There are a few places where the GCS tests relied on file writes or file info methods so I've replaced those with direct calls to the Azure blob client and left TODO comments saying to switch them to use the AzureFilesystem when the relevant methods are implemented. ### Are there any user-facing changes? Yes. File reads using the Azure filesystem are now supported. * Closes: #37511 Lead-authored-by: Thomas Newton <thomas.w.newton@gmail.com> Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>