Commits


Heres, Daniel authored and Andy Grove committed c019e7944cf
ARROW-10885: [Rust][DataFusion] Optimize hash join build vs probe order based on number of rows This PR uses the `num_rows` statistics to implement a common optimization to use the smallest table for the build phase. This is a good heuristic, as to have the smallest table used in the build phase leads to less items to be inserted to the hash table, in particular if the size of tables is very imbalanced. Some notes: * The optimization works on the `LogicalPlan` by swapping left and right, the join type and the key order. This seems currently the easiest place to add it, as there is no cost based optimizer and/or optimizers on the physical plan yet. The optimization rule assumes that the left part of the join will be used for the build phase and the right part for the probe phase. * It requires the number of rows to be exactly known, so it will not work whenever there is a transformation changing the number of rows, except for `limit`. The idea here is that in other cases, it is very hard to estimate the number of resulting rows. * The impact currently is measurable on queries with a bigger left side of an (inner) join FYI @andygrove @jorgecarleitao Closes #8961 from Dandandan/rows_hash Lead-authored-by: Heres, Daniel <danielheres@gmail.com> Co-authored-by: Daniël Heres <danielheres@gmail.com> Signed-off-by: Andy Grove <andygrove73@gmail.com>