Commits


Neal Richardson authored and GitHub committed 47a602dbd9b
GH-34437: [R] Use FetchNode and OrderByNode (#34685) ### Rationale for this change See also #32991. By using the new nodes, we're closer to having all dplyr query business happening inside the ExecPlan. Unfortunately, there are still two cases where we have to apply operations in R after running a query: * #34941: Taking head/tail on unordered data, which has non-deterministic results but that should be possible, in the case where the user wants to see a slice of the result, any slice * #34942: Implementing tail in the FetchNode or similar would enable removing more hacks and workarounds. Once those are resolved, we can simply further and then move to the new Declaration class. ### What changes are included in this PR? This removes the use of different SinkNodes and many R-specific workarounds to support sorting and head/tail, so *almost* everything we do in a query should be represented in an ExecPlan. ### Are these changes tested? Yes. This is mostly an internal refactor, but behavior changes are accompanied by test updates. ### Are there any user-facing changes? The `show_query()` method will print slightly different ExecPlans. In many cases, they will be more informative. `tail()` now actually returns the tail of the data in cases where the data has an implicit order (currently only in-memory tables). Previously it was non-deterministic (and would return the head or some other slice of the data). When printing query objects that include `summarize()` when the `arrow.summarize.sort = TRUE` option is set, the sorting is correctly printed. It's unclear if there should be changes in performance; running benchmarks would be good but it's also not clear that our benchmarks cover all affected scenarios. * Closes: #34437 * Closes: #31980 * Closes: #31982 Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>