Commits


Nic Crane authored and GitHub committed 444dcb67797
GH-33526: [R] Implement new function open_dataset_csv with signature more closely matching read_csv_arrow (#33614) This PR implements a wrapper around `open_dataset()` specifically for value-delimited files. It takes the parameters from `open_dataset()` and appends the parameters of `read_csv_arrow()` which are compatible with `open_dataset()`. This should make it easier for users to switch between the two, e.g.: ``` r library(arrow) library(dplyr) # Set up directory for examples tf <- tempfile() dir.create(tf) on.exit(unlink(tf)) df <- data.frame(x = c("1", "2", "NULL")) file_path <- file.path(tf, "file1.txt") write.table(df, file_path, sep = ",", row.names = FALSE) read_csv_arrow(file_path, na = c("", "NA", "NULL"), col_names = "y", skip = 1) #> # A tibble: 3 × 1 #> y #> <int> #> 1 1 #> 2 2 #> 3 NA open_csv_dataset(file_path, na = c("", "NA", "NULL"), col_names = "y", skip = 1) %>% collect() #> # A tibble: 3 × 1 #> y #> <int> #> 1 1 #> 2 2 #> 3 NA ``` This PR also hooks up the "na" (readr-style) parameter to "null_values" (i.e. CSVConvertOptions parameter). In the process of making this PR, I also refactored `CsvFileFormat$create()`. Unfortunately, many changes needed to be made at once, which has considerably increasing the size/complexity of this PR. Authored-by: Nic Crane <thisisnic@gmail.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>