Commits


Javier Luraschi authored and Wes McKinney committed 32960a13546
ARROW-3479: [R] Support to write record_batch as stream Using this PR as a WIP to efficiently transfer data from R to Spark using Arrow. This PR might be ultimately closed and not merged, but thought it would be good to give visibility as to what I'm exploring. Specifically, I'm working on supporting efficient execution of: ```r library(sparklyr) sc <- spark_connect(master = "local") copy_to(sc, system.time({ tbl_data <- sdf_copy_to(sc, data.frame(y = runif(10^6, 0, 1)), "data", overwrite = TRUE) }) ``` Currently, without this PR and without using `arrow`: ```r system.time({ tbl_data <- sdf_copy_to(sc, data.frame(y = runif(10^6, 0, 1)), "data", overwrite = TRUE) }) ``` ``` user system elapsed 1.120 0.087 3.482 ``` Using `arrow` is down to: ```r library(arrow) copy_to(sc, system.time({ tbl_data <- sdf_copy_to(sc, data.frame(y = runif(10^6, 0, 1)), "data", overwrite = TRUE) }) ``` ``` user system elapsed 0.222 0.029 0.641 ``` and down to the following while using `record$to_raw()` from this PR instead of `record$to_file()`: ``` user system elapsed 0.102 0.007 0.351 ``` Author: Javier Luraschi <javierluraschi@hotmail.com> Closes #2727 from javierluraschi/feature/r-to-raw and squashes the following commits: 0e302a590 <Javier Luraschi> use snake casing not camel 40f4e24d9 <Javier Luraschi> additional code review feedback 0cf4ce748 <Javier Luraschi> additional code review feedback 9e27d04b7 <Javier Luraschi> fix clang lint warnings ec1a8c2fe <Javier Luraschi> add test for record_batch to_stream a1580df0d <Javier Luraschi> avoid double copy under to_stream for R bindings 643371dbc <Javier Luraschi> implement record to_raw() for r bindings