Commits


eitsupi authored and GitHub committed 6bd00508116
GH-35445: [R] Behavior something like group_by(foo) |> across(everything()) is different from dplyr (#35473) ### Rationale for this change The argument `.cols` of the `dplyr::across` function has the following description. > You can't select grouping columns because they are already automatically handled by the verb (i.e. summarise() or mutate()). However, this behavior is currently not reproduced in the `arrow` package and an error occurs when selecting the column used for grouping with `everything()`. ``` r mtcars |> arrow::as_arrow_table() |> dplyr::group_by(cyl) |> dplyr::summarise(dplyr::across(everything(), sum)) |> dplyr::collect() #> Error in `compute.arrow_dplyr_query()`: #> ! Invalid: Multiple matches for FieldRef.Name(cyl) in mpg: double #> cyl: double #> disp: double #> hp: double #> drat: double #> wt: double #> qsec: double #> vs: double #> am: double #> gear: double #> carb: double #> cyl: double #> Backtrace: #> ▆ #> 1. ├─dplyr::collect(...) #> 2. └─arrow:::collect.arrow_dplyr_query(...) #> 3. └─arrow:::compute.arrow_dplyr_query(x) #> 4. └─base::tryCatch(...) #> 5. └─base (local) tryCatchList(expr, classes, parentenv, handlers) #> 6. └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]]) #> 7. └─value[[3L]](cond) #> 8. └─arrow:::augment_io_error_msg(e, call, schema = schema()) #> 9. └─rlang::abort(msg, call = call) ``` <sup>Created on 2023-05-05 with [reprex v2.0.2](https://reprex.tidyverse.org)</sup> This PR fixes this behavior to match with dplyr's original behavior. ### What changes are included in this PR? - Auto exclude grouping columns in `across` in `mutate`, `transmute`, and `summarise`. - The `.data` argument of internal function `expand_across` should be `arrow_dplyr_query`. Some tests have been slightly modified to accommodate this change. - `mutate`, `transmute`, `arrange`, `filter` always return `arrow_dplyr_query`. Currently, `arrow_dplyr_query` is not returned in the following cases, which was not consistent. ```r mtcars |> arrow::arrow_table() |> dplyr::mutate() ``` - Correct the order of columns in results of `group_by(foo) |> mutate(.keep = "none")` Currently, the results of the following query show that the columns used for grouping have moved to the tail and differ from the behavior of dplyr. ```r mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::mutate(am, .keep = "none") |> dplyr::collect() ``` - Correct the order of columns in results of `group_by(foo) |> transmute()` Currently, the results of the following query show that the columns used for grouping have moved to the tail and differ from the behavior of dplyr. ```r mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::transmute(mpg) |> dplyr::collect() ``` After `transmute`, the group columns should move to the left. (This is a different behavior from `mutate(.keep = "none")`, which keeps the original position.) ### Are these changes tested? Yes. ### Are there any user-facing changes? Yes. * Closes: #35445 Authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>