Commits


Dewey Dunnington authored and Jonathan Keane committed 24689928da8
ARROW-14297 [R] smooth out integer division to better match R This PR updates the floor division dplyr translation to better respect the input types (as determined by how this would be done in R). The main change is the output type: `integer_type_1 %/% integer_type_2` will now have the same type as `integer_type_1` and everything else has the same type as `floor(arg1 / arg2)`. As a side effect, floor division by zero is `Inf` rather than the maximum integer value (unless you try to floor divide by `0L`...see below). A few things that need some hashing out: - Floor division by `0L` results in the max integer value rather than `NA`. This is, I think, because it's how cast (even with `safe = TRUE`) to integer from `Inf`. That is perhaps a different issue than this one? - There's [some tests for floor division for arrays outside a dplyr verb](https://github.com/apache/arrow/blob/master/r/tests/testthat/test-compute-arith.R#L64-L94) that appear to be using a [completely different translation logic](https://github.com/apache/arrow/blob/master/r/R/compute.R). I didn't update those tests or that logic because it seemed like a different issue to me (maybe needs to implement the Math and/or Ops group generics or more S3 methods for the array class?). Reprex before this PR: <details> ``` r # remotes::install_github("apache/arrow/r") library(arrow, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) tbl <- tibble::tibble( integers = c(1:4, NA_integer_), doubles = c(as.numeric(1:4), NA_real_) ) tbl %>% mutate( int_div_dbl = integers %/% 2, int_div_int = integers %/% 2L, int_div_zero_int = integers %/% 0L, int_div_zero_dbl = integers %/% 0, dbl_div_dbl = doubles %/% 2, dbl_div_int = doubles %/% 2L, dbl_div_zero_int = doubles %/% 0L, dbl_div_zero_dbl = doubles %/% 0 ) %>% glimpse() #> Rows: 5 #> Columns: 10 #> $ integers <int> 1, 2, 3, 4, NA #> $ doubles <dbl> 1, 2, 3, 4, NA #> $ int_div_dbl <dbl> 0, 1, 1, 2, NA #> $ int_div_int <int> 0, 1, 1, 2, NA #> $ int_div_zero_int <int> NA, NA, NA, NA, NA #> $ int_div_zero_dbl <dbl> Inf, Inf, Inf, Inf, NA #> $ dbl_div_dbl <dbl> 0, 1, 1, 2, NA #> $ dbl_div_int <dbl> 0, 1, 1, 2, NA #> $ dbl_div_zero_int <dbl> Inf, Inf, Inf, Inf, NA #> $ dbl_div_zero_dbl <dbl> Inf, Inf, Inf, Inf, NA RecordBatch$create(!!! tbl) %>% mutate( int_div_dbl = integers %/% 2, int_div_int = integers %/% 2L, int_div_zero_int = integers %/% 0L, int_div_zero_dbl = integers %/% 0, dbl_div_dbl = doubles %/% 2, dbl_div_int = doubles %/% 2L, dbl_div_zero_int = doubles %/% 0L, dbl_div_zero_dbl = doubles %/% 0, ) %>% collect() %>% glimpse() #> Rows: 5 #> Columns: 10 #> $ integers <int> 1, 2, 3, 4, NA #> $ doubles <dbl> 1, 2, 3, 4, NA #> $ int_div_dbl <int> 0, 1, 1, 2, NA #> $ int_div_int <int> 0, 1, 1, 2, NA #> $ int_div_zero_int <int> 2147483647, 2147483647, 2147483647, 2147483647, NA #> $ int_div_zero_dbl <int> 2147483647, 2147483647, 2147483647, 2147483647, NA #> $ dbl_div_dbl <int> 0, 1, 1, 2, NA #> $ dbl_div_int <int> 0, 1, 1, 2, NA #> $ dbl_div_zero_int <int> 2147483647, 2147483647, 2147483647, 2147483647, NA #> $ dbl_div_zero_dbl <int> 2147483647, 2147483647, 2147483647, 2147483647, NA ``` <sup>Created on 2021-11-09 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup> </details> Reprex after this PR: <details> ``` r # remotes::install_github("paleolimbot/arrow/r@r-floor-div") library(arrow, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) tbl <- tibble::tibble( integers = c(1:4, NA_integer_), doubles = c(as.numeric(1:4), NA_real_) ) tbl %>% mutate( int_div_dbl = integers %/% 2, int_div_int = integers %/% 2L, int_div_zero_int = integers %/% 0L, int_div_zero_dbl = integers %/% 0, dbl_div_dbl = doubles %/% 2, dbl_div_int = doubles %/% 2L, dbl_div_zero_int = doubles %/% 0L, dbl_div_zero_dbl = doubles %/% 0 ) %>% glimpse() #> Rows: 5 #> Columns: 10 #> $ integers <int> 1, 2, 3, 4, NA #> $ doubles <dbl> 1, 2, 3, 4, NA #> $ int_div_dbl <dbl> 0, 1, 1, 2, NA #> $ int_div_int <int> 0, 1, 1, 2, NA #> $ int_div_zero_int <int> NA, NA, NA, NA, NA #> $ int_div_zero_dbl <dbl> Inf, Inf, Inf, Inf, NA #> $ dbl_div_dbl <dbl> 0, 1, 1, 2, NA #> $ dbl_div_int <dbl> 0, 1, 1, 2, NA #> $ dbl_div_zero_int <dbl> Inf, Inf, Inf, Inf, NA #> $ dbl_div_zero_dbl <dbl> Inf, Inf, Inf, Inf, NA RecordBatch$create(!!! tbl) %>% mutate( int_div_dbl = integers %/% 2, int_div_int = integers %/% 2L, int_div_zero_int = integers %/% 0L, int_div_zero_dbl = integers %/% 0, dbl_div_dbl = doubles %/% 2, dbl_div_int = doubles %/% 2L, dbl_div_zero_int = doubles %/% 0L, dbl_div_zero_dbl = doubles %/% 0, ) %>% collect() %>% glimpse() #> Rows: 5 #> Columns: 10 #> $ integers <int> 1, 2, 3, 4, NA #> $ doubles <dbl> 1, 2, 3, 4, NA #> $ int_div_dbl <dbl> 0, 1, 1, 2, NA #> $ int_div_int <int> 0, 1, 1, 2, NA #> $ int_div_zero_int <int> 2147483647, 2147483647, 2147483647, 2147483647, NA #> $ int_div_zero_dbl <dbl> Inf, Inf, Inf, Inf, NA #> $ dbl_div_dbl <dbl> 0, 1, 1, 2, NA #> $ dbl_div_int <dbl> 0, 1, 1, 2, NA #> $ dbl_div_zero_int <dbl> Inf, Inf, Inf, Inf, NA #> $ dbl_div_zero_dbl <dbl> Inf, Inf, Inf, Inf, NA ``` <sup>Created on 2021-11-09 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup> </details> Closes #11652 from paleolimbot/r-floor-div Authored-by: Dewey Dunnington <dewey@fishandwhistle.net> Signed-off-by: Jonathan Keane <jkeane@gmail.com>