Commits


Thomas Newton authored and GitHub committed 75a04030996
GH-38335: [C++] Implement `GetFileInfo` for a single file in Azure filesystem (#38505) ### Rationale for this change `GetFileInfo` is an important part of an Arrow filesystem implementation. ### What changes are included in this PR? - Start `azurefs_internal` similar to GCS and S3 filesystems. - Implement `HierarchicalNamespaceDetector`. - This does not use the obvious and simple implementation. It uses a more complicated option inspired by `hadoop-azure` that avoids requiring the significantly elevated permissions needed for `blob_service_client->GetAccountInfo()`. - This can't be detected an initialisation time of the filesystem because it requires a `container_name`. Its packed into its only class so that the result can be cached. - Implement `GetFileInfo` for single paths. - Supports hierarchical or flat namespace accounts and takes advantage of hierarchical namespace where possible to avoid unnecessary extra calls to blob storage. The performance difference is actually noticeable just from running the `GetFileInfoObjectWithNestedStructure` test against real flat and hierarchical accounts. Its about 3 seconds with hierarchical namespace or 5 seconds with a flat namespace. - Update tests with TODO(GH-38335) to now use this implementation of `GetFileInfo` to replace the temporary direct Azure SDK usage. - Rename the main test fixture and introduce new ones for connecting to real blob storage. If details of real blob storage is not provided then the real blob storage tests will be skipped. ### Are these changes tested? Yes. There are new Azurite based tests for everything that can be tested with Azurite. There are also some tests that are designed to test against a real blob storage account. This is because [Azurite cannot emulate a hierarchical namespace account](https://github.com/Azure/Azurite/issues/553). Additionally some of the behaviour used to detect a hierarchical namespace account is different on Azurite compared to a real flat namespace account. These tests will be automatically skipped unless environment variables are provided with details for connecting to the relevant real storage accounts. Initially I based the tests on the GCS filesystem but I added a few extras where I thought it was appropriate. ### Are there any user-facing changes? Yes. `GetFileInfo` is now supported on the Azure filesystem. * Closes: #38335 Lead-authored-by: Thomas Newton <thomas.w.newton@gmail.com> Co-authored-by: Sutou Kouhei <kou@cozmixng.org> Signed-off-by: Sutou Kouhei <kou@clear-code.com>