Commits


emkornfield authored and GitHub committed 7b5912deea1
ARROW-14892: [Python][C++] GCS Bindings (#12763) Incorporate GCS file system into python and other bug fixes. Bugs/Other changes: - Add GCS bindings mostly based on AWS bindings in Python and associated unit tests - Tell was incorrect, it double counted when the stream was constructed with an offset. - Missed setting the define in config.cmake which means `FileSystemFromUri was never tested and didn't compile this is now fixed` - Refine logic for GetFileInfo with a single path to recognize prefixes followed by a slash as a directory. This allows datasets to work as expected with a toy dataset generated on local-filesystem and copied to the cloud (I believe this is typical of how other systems write to GCS as well. - Switch convention for creating directories to always end in "/" and make use of this as another indicator. From testing with a sample iceberg table it appears this is the convention used for hive-partitioning, so I assume this is common practice for other Hive related writers (i.e. what we want to support). - Fix bug introduced in https://github.com/apache/arrow/commit/a5e45cecb24229433b825dac64e0ffd10d400e8c which caused failures when a deletion occurred on a bucket (not an object in the bucket). - Ensure output streams are closed on destruction (this is consistent with S3) Lead-authored-by: Micah Kornfield <micahk@google.com> Co-authored-by: emkornfield <emkornfield@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>