Commits


Dragoș Moldovan-Grünfeld authored and GitHub committed 10bb61804d0
ARROW-16407: [R] Extend `parse_date_time` to cover hour, dates, and minutes components (#13196) This PR improves `parse_date_time()` by: * adding support for orders with the hours, minutes, and seconds components * adding support for unseparated strings ([ARROW-16446](https://issues.apache.org/jira/browse/ARROW-16446)) * supporting the `exact` argument: * allows users to pass `exact = TRUE` in which case the `orders` are taken as they are (they are considered `formats` and passed to `strptime`) * `exact = FALSE` implies `formats` are derived from `orders` * allowing the `truncated` argument * denotes number of formats that might be missing. For example, passing an `order` like `ymd_HMS` and a value of 1 for `truncated` will attempt parsing with both `ymd_HMS` and `ymd_HM` orders * erroring when the user passes `quiet = FALSE` * improves the utility function used to generate `formats` (which are then passed on to `strptime`) from `orders` * less hard-coding and increased ability to deal with different orders and separators the `ymd HMS` orders (and variants) will parse correctly: ``` r library(dplyr, warn.conflicts = FALSE) library(lubridate, warn.conflicts = FALSE) library(arrow, warn.conflicts = FALSE) test_df <- tibble( x = c("2011-12-31 12:59:59", "2010-01-01 12:11", "2010-01-01 12", "2010-01-01") ) test_df %>% mutate( y = parse_date_time(x, "Ymd HMS", truncated = 3) ) #> # A tibble: 4 × 2 #> x y #> <chr> <dttm> #> 1 2011-12-31 12:59:59 2011-12-31 12:59:59 #> 2 2010-01-01 12:11 2010-01-01 12:11:00 #> 3 2010-01-01 12 2010-01-01 12:00:00 #> 4 2010-01-01 2010-01-01 00:00:00 test_df %>% arrow_table() %>% mutate( y = parse_date_time(x, "Ymd HMS", truncated = 3) ) %>% collect() #> # A tibble: 4 × 2 #> x y #> <chr> <dttm> #> 1 2011-12-31 12:59:59 2011-12-31 12:59:59 #> 2 2010-01-01 12:11 2010-01-01 12:11:00 #> 3 2010-01-01 12 2010-01-01 12:00:00 #> 4 2010-01-01 2010-01-01 00:00:00 ``` <sup>Created on 2022-05-19 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup> `exact = TRUE` can also be used: <details> ``` r library(arrow, warn.conflicts = FALSE) library(lubridate, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) test_df <- tibble( x = c("11/23/1998 07:00:00", "6/18/1952 0135", "2/25/1974 0523", "9/07/1985 01", NA) ) test_df %>% mutate( parsed_x = parse_date_time( x, c("%m/%d/%Y %I:%M:%S", "%m/%d/%Y %H%M", "%m/%d/%Y %H"), exact = TRUE ) ) #> # A tibble: 5 × 2 #> x parsed_x #> <chr> <dttm> #> 1 11/23/1998 07:00:00 1998-11-23 07:00:00 #> 2 6/18/1952 0135 1952-06-18 01:35:00 #> 3 2/25/1974 0523 1974-02-25 05:23:00 #> 4 9/07/1985 01 1985-09-07 01:00:00 #> 5 <NA> NA test_df %>% arrow_table() %>% mutate( parsed_x = parse_date_time( x, c("%m/%d/%Y %I:%M:%S", "%m/%d/%Y %H%M", "%m/%d/%Y %H"), exact = TRUE ) ) %>% collect() #> # A tibble: 5 × 2 #> x parsed_x #> <chr> <dttm> #> 1 11/23/1998 07:00:00 1998-11-23 07:00:00 #> 2 6/18/1952 0135 1952-06-18 01:35:00 #> 3 2/25/1974 0523 1974-02-25 05:23:00 #> 4 9/07/1985 01 1985-09-07 01:00:00 #> 5 <NA> NA ``` <sup>Created on 2022-05-20 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup> </details> Authored-by: Dragoș Moldovan-Grünfeld <dragos.mold@gmail.com> Signed-off-by: Alessandro Molina <amol@turbogears.org>