Commits


Qingping Hou authored and Andy Grove committed ee09cb6edce
ARROW-8839: [Rust] [DataFusion] support CSV schema inference in logical plan This PR changes schema argument for scan_csv method into `Option<&Schema>`. Other related changes are needed to make this happen including: * added delimiter argument to all csv related structs and functions * fixed a bug in schema field inference function * made `arrow::csv::reader::infer_file_schema` public so it can be used by data fusion Known limitations: * when provided with a directory of csv files, schema inference code only reads rows from the first file. * to avoid adding yet another argument to all csv related functions, i hard coded number of rows to read for schema inference to 1000 Open questions: * Should we rename `datasource::csv::CsvFile` struct to `CsvTable` to keep it consistent with ParquetTable and MemoryTable? The implementation of CsvFile also supports reading from a directory of files, so `CsvFile` is not an accurate name. * csv related function arguments are getting a bit long, should we introduce a csv option struct to capture the following configs with sensible defaults? - schema - has_header - delimiter - infer_max_read_records Closes #7210 from houqp/csv_schema_infer Authored-by: Qingping Hou <dave2008713@gmail.com> Signed-off-by: Andy Grove <andygrove73@gmail.com>