Commits

Neal Richardson authored 21ad7ac1162
ARROW-6340 [R] Implements low-level bindings to Dataset classes This patch follows the end-to-end C++ test on #5675 and implements enough classes in R to do the whole path from DataSourceDiscovery -> DataSource -> Dataset -> ScannerBuilder -> Scanner -> Table -> data.frame. It also implements dplyr logic for `select` and `filter` (which plug into the `ScannerBuilder$Project()` and `$Filter()` methods), as well as the other dpylr verbs implemented in a recent patch. See r/tests/testthat/test-dataset.R for examples of the high-level user interface and the lower-level wrappers. To do: - Vignette (deferred to https://issues.apache.org/jira/browse/ARROW-7092) - Resolve question/hack of FileSystem shared_ptr/global (deferred to https://issues.apache.org/jira/browse/ARROW-7094) - ScalarExpression creation in r/src/expression.cpp is limited to logical/integer/double (deferred to https://issues.apache.org/jira/browse/ARROW-7093) - Behavior when hitting unsupported queries: https://issues.apache.org/jira/browse/ARROW-7095 Closes #5454 from romainfrancois/ARROW-6340/Dataset and squashes the following commits: 9dfba2ea8 <Neal Richardson> Add -DARROW_DS_STATIC to configure.win be6621c45 <Neal Richardson> Document all the things 5e34d760d <Neal Richardson> Some review feedback and cleanup c65aaa179 <Neal Richardson> Add hive partitioning and start documenting ca3017c7c <Neal Richardson> ScannerBuilder->UseThreads(), plus some assorted fixes and temporary hacks e64bf35df <Neal Richardson> Cleanup some TODOs f9183a1f9 <Neal Richardson> Add some more input validation f1954fe39 <Neal Richardson> Update NAMESPACE e9a61888d <Neal Richardson> Make test dataset have multiple files. Start with partitioning (but it errors) c45340ba7 <Neal Richardson> and/or/not 3043a0378 <Neal Richardson> Simple dataset creation with open_dataset() c68eab564 <Neal Richardson> dplyr on a Dataset 072f46d55 <Neal Richardson> More expression support 568755b73 <Neal Richardson> Add filter expressions. The bear dances! 440054357 <Neal Richardson> Add Project, schema, names methods 62a0809e4 <Neal Richardson> Almost a table 77d3fea73 <Neal Richardson> Hey look a Dataset f7b92c93b <Neal Richardson> Look for libarrow_dataset 7e43ebf6f <Romain Francois> support for std::vector<std::shared_ptr<T>> in Rcpp function input a68bb3bfa <Romain Francois> dataset types Lead-authored-by: Neal Richardson <neal.p.richardson@gmail.com> Co-authored-by: Romain Francois <romain@rstudio.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>