Commits


Benjamin Kietzman authored and Antoine Pitrou committed 17a0709dc66
ARROW-6229: [C++][Dataset] implement FileSystemBasedDataSource Adds FileSystemBasedDataSource, which is constructed with a file format and reads a given directory recursively on construction then yields data fragments from discovered files matching that format. Also adds FileFormat::MakeFragment which creates a(n instance of a subclass of) FileBasedDataFragment given a FileSource and ScanOptions. moved PR from: https://github.com/fsaintjacques/arrow/pull/2 Closes #5139 from bkietz/6229-Add-a-DataSource-implemen and squashes the following commits: 85e2cc425 <Benjamin Kietzman> iwyu: vector d5cd207ef <Benjamin Kietzman> refactor fs datasource tests into a mixin 1a985480d <Benjamin Kietzman> correct call_traits::is_overloaded b72866afe <Benjamin Kietzman> refactor single_call 11d568741 <Benjamin Kietzman> remove checks for file existence from MakeFragment() 704a5bb19 <Benjamin Kietzman> remove assert which incorrectly assumes ordering 385d8dcb0 <Benjamin Kietzman> add check for existence to ParquetFileFormat 7ea4d7058 <Benjamin Kietzman> add tests demostrating failure on deleted files 8b7fb3164 <Benjamin Kietzman> nullptr in a header 981abb923 <Benjamin Kietzman> implement FileSystemBasedDataSource Authored-by: Benjamin Kietzman <bengilgit@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>