Commits


François Saint-Jacques authored and Benjamin Kietzman committed 4d8685b46b1
ARROW-7061: [C++][Dataset] Add ignore file options to FileSystemDataSourceDiscovery The FileSystemDataSource implementation requires that only files supported by it's FileFormat are given. The following patch provides a way to filter files in the FileSystemDataSourceDiscovery by two options: - A list of ignore prefixes, configurable via `FileSystemDiscoveryOptions.ignore_prefixes` - The automatic filtering of files not recognized by the fomat, configurable via `FileSystemDiscoveryOptions.filter_supported_files` By default all files prefixed by `.` (hidden files on Posix systems) and `_` (metadata files in HDFS/S3) will be ignored and the discovery process will ignore files not supported by the format. Closes #5811 from fsaintjacques/ARROW-7061-dataset-skip-hidden-files and squashes the following commits: 3419bd0f6 <François Saint-Jacques> Address comments 9f87bfc20 <François Saint-Jacques> Use explicit ParquetInvalidOrCorruptedFileException 3e34a882b <François Saint-Jacques> Add FileFormat::IsSupported to filter files optimally 072a1ff4a <François Saint-Jacques> ARROW-7061: Ignore file patterns in FileSystemDataSourceDiscovery Authored-by: François Saint-Jacques <fsaintjacques@gmail.com> Signed-off-by: Benjamin Kietzman <bengilgit@gmail.com>