Commits

David Li authored bafaa76bad8
ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8 When the dataset writer is configured to delete existing data before writing, the target directory is on S3, the dataset is partitioned, and there are at least as many partitions as threads in the I/O thread pool, then the writer would hang. The writer spawns a task on the I/O thread pool for each partition to delete existing data. However, S3FS implemented the relevant filesystem call by asynchronously listing the objects using the I/O thread pool, then deleting them, blocking until this is done. Hence, nested asynchrony would cause the program to hang. The fix is to do this deletion fully asynchronously, so that there is no blocking. It's sufficient to just use the default implementation of async filesystem methods; it just spawns another task on the I/O thread pool, but this lets the writer avoid blocking. However, this PR also refactors the S3FS internals to implement the call truly asynchronously. This PR also implements FileInterface::CloseAsync. This is required because by default on S3, files do writes asynchronously in the background, and Close() just blocks until those complete. This also consumes the I/O thread pool (both the blocking and the background writes), so we need an async version of this to avoid the deadlock. Closes #12099 from lidavidm/arrow-15265 Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>