Public / arrow / 157062d4b3a

Commits

eitsupi authored and GitHub committed 157062d4b3a11 Jan 2023
ARROW-17425: [R] `lubridate::as_datetime()` in dplyr query should be able to handle time in sub seconds (#13890)

This change allows strings containing sub-seconds and double types to be used as input to `lubridate::as_datetime()`.

```r
1.5 |>
  arrow::arrow_table(x = _) |>
  dplyr::mutate(
    y = lubridate::as_datetime(x)
  ) |>
  dplyr::collect() |>
  dplyr::mutate(
    z = lubridate::as_datetime(x),
    is_equal = (y == z)
  )
#>     x                   y                   z is_equal
#> 1 1.5 1970-01-01 00:00:01 1970-01-01 00:00:01     TRUE
```

And, because the timestamp type generated by `as_datetime` is expected to be used in combination with other functions, fix the bug of ~~`as.Date` and~~ `lubridate::as_date` that could cause an error if a sub-seconds timestamp was entered.

Edit: as.Date fixed by #14935

As a breaking change, the return type of `as_datetime()` will be nanoseconds, but I hope this will not have a major impact, since originally `as_datetime() |> as.integer()` or `as_datetime() |> as.numeric()` could not be used because it would try to cast to int32 or double, resulting in an error.
(We have to cast timestamp to int64)

arrow 9.0.0

```r
1 |>
  arrow::arrow_table(x = _) |>
  dplyr::mutate(
    x = lubridate::as_datetime(x),
    y = cast(x, arrow::int64())
  ) |>
  dplyr::collect()
#>                     x y
#> 1 1970-01-01 00:00:01 1
```

This PR

``` r
1 |>
  arrow::arrow_table(x = _) |>
  dplyr::mutate(
    x = lubridate::as_datetime(x),
    y = cast(x, arrow::int64())
  ) |>
  dplyr::collect()
#>                     x          y
#> 1 1970-01-01 00:00:01 1000000000
```

Lead-authored-by: SHIMA Tatsuya <ts1s1andn@gmail.com>
Co-authored-by: Nic Crane <thisisnic@gmail.com>
Signed-off-by: Nic Crane <thisisnic@gmail.com>