Commits


Ruan Pearce-Authers authored and Andrew Lamb committed 5eb6ce18e11
ARROW-9828: [Rust] [DataFusion] Support filter pushdown optimisation for TableProvider implementations I've got a use case for this with a custom TableProvider implementation, so thought I'd give this a go :) This PR allows TableProviders to optionally indicate that they support handling filter expressions either: - Inexactly, to simply optimise data retrieval in an approximate fashion; e.g. pruning in your classic chunked storage system with min/max column metadata stored per chunk - Exactly, in which case the relevant filter plan nodes can be optimised out entirely Some preemptive concerns from my side: - Most of these concepts could probably have better names, open to suggestions here. - I'm not sure whether expressions are the correct thing to be pushing down to the provider. - I've had to update quite a few `scan` callsites with empty filter lists. Could this be handled in a better way? - Currently, only table scans using TableSource::FromProvider are supported, because we need a reference to the provider at optimisation time. #8910 removes the provider/named-based reference distinction entirely so I can rebase this once that's merged and add an extra test using an ordinary sql statement, rather than just a `ctx.read_table(provider)` call. I'd appreciate any thoughts or feedback! Closes #8917 from returnString/table_provider_pushdown Authored-by: Ruan Pearce-Authers <ruanpa@outlook.com> Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>