Commits


alamb authored and Andy Grove committed a84e7b0205f
ARROW-9652: [Rust][DataFusion] Error message rather than panic for external csv tables with no column defs This PR builds on https://github.com/apache/arrow/pull/7905 to do two things: 1. Make better error messages for CREATE EXTERNAL TABLE commands that are not semantically valid and prevents a subsequent panic in the planner 2. Move planning logic for CREATE EXTERNAL TABLE out of context.rs and into planer.rs The behavior before this PR: ``` CREATE EXTERNAL TABLE repro STORED AS CSV LOCATION 'repro.csv'; > select * from repro; thread 'main' panicked at 'index out of bounds: the len is 0 but the index is 0', datafusion/src/optimizer/projection_push_down.rs:238:31 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` The behavior after this PR: ``` > CREATE EXTERNAL TABLE repro STORED AS CSV LOCATION 'repro.csv'; General("Column definitions required for CSV files. None found") ``` There are more detail on https://issues.apache.org/jira/browse/ARROW-9652 ## Planning Logic Consolidation into planner.rs This change could be much smaller if I had placed the checks directly in parser.rs. However, I instead propose the larger (but I think more coherent) strategy of moving the planing logic for CREATE EXTERNAL into the planner. This means that now the planner handles all the DataFusion SQL Dialects internally rather than only the SQL ones. One reason I didn't just want to put the checks into parser.rs itself was that this check seems like it is actually a semantic check rather than a syntactic one -- aka there is nothing wrong with the SQL itself, but there is something logically wrong with the statement. The PR is broken up into two commits -- one to move the CREATE EXTERNAL TABLE code into the planner and the other to add new semantic checks. I am happy to break the change into two PRs if you prefer. I am also happy to move the check to parser.rs, if you prefer. ## CSV Schema Inference Logic There appears to be some CSV schema inference logic in DataFusion (e.g. [CsvExec::try_infer et al](https://github.com/apache/arrow/blob/master/rust/datafusion/src/execution/physical_plan/csv.rs#L123) , but it does not appear to be run prior to the planner panicing. I may have missed a better intended behavior that the CSV schema inference logic gets run at CREATE EXTERNAL TABLE planning time. Closes #7906 from alamb/alamb/no-schema Authored-by: alamb <andrew@nerdnetworks.org> Signed-off-by: Andy Grove <andygrove73@gmail.com>