Commits

Neal Richardson authored 9dec79bf0c1
ARROW-5505: [R] Normalize file and class names, stop masking base R functions, add vignette, improve documentation The main thrust of the changes are summarized in the new vignette: > C++ is an object-oriented language, so the core logic of the Arrow library is encapsulated in classes and methods. In the R package, these classes are implemented as `R6` reference classes, most of which are exported from the namespace. > > In order to match the C++ naming conventions, the `R6` classes are in TitleCase, e.g. `RecordBatch`. This makes it easy to look up the relevant C++ implementations in the [code](https://github.com/apache/arrow/tree/master/cpp) or [documentation](https://arrow.apache.org/docs/cpp/). To simplify things in R, the C++ library namespaces are generally dropped or flattened; that is, where the C++ library has `arrow::io::FileOutputStream`, it is just `FileOutputStream` in the R package. One exception is for the file readers, where the namespace is necessary to disambiguate. So `arrow::csv::TableReader` becomes `CsvTableReader`, and `arrow::json::TableReader` becomes `JsonTableReader`. > > Some of these classes are not meant to be instantiated directly; they may be base classes or other kinds of helpers. For those that you should be able to create, use the `$create()` method to instantiate an object. For example, `rb <- RecordBatch$create(int = 1:10, dbl = as.numeric(1:10))` will create a `RecordBatch`. Many of these factory methods that an R user might most often encounter also have a `snake_case` alias, in order to be more familiar for contemporary R users. So `record_batch(int = 1:10, dbl = as.numeric(1:10))` would do the same as `RecordBatch$create()` above. > > The typical user of the `arrow` R package may never deal directly with the `R6` objects. We provide more R-friendly wrapper functions as a higher-level interface to the C++ library. An R user can call `read_parquet()` without knowing or caring that they're instantiating a `ParquetFileReader` object and calling the `$ReadFile()` method on it. The classes are there and available to the advanced programmer who wants fine-grained control over how the C++ library is used. There are a few other fixes and cleanups rolled in here, named in the individual commit messages below. I stopped short of more documentation consolidation because (1) this patch is already huge and (2) `R6` classes are really tedious to document because it's all manual. I did some searching around and found open issues from 2014 and 2015 about supporting R6 better in roxygen2. Closes #5279 from nealrichardson/cleaner-class-names and squashes the following commits: 3c6f85bfb <Neal Richardson> :rat: 22c9d0420 <Neal Richardson> More doc cleaning 01084ce7d <Neal Richardson> Factor out assert_is() caf3265d3 <Neal Richardson> PR feedback from romain adf1cf916 <Neal Richardson> File renaming (not case-sensitive) 35f00f52d <Neal Richardson> Rename Table.R to table.R 8bd52d722 <Neal Richardson> Rename Struct.R to struct.R 358290bc6 <Neal Richardson> Rename Schema.R to schema.R 924edd1c4 <Neal Richardson> Rename List.R to list.R 0150d9923 <Neal Richardson> Rename Field.R to field.R 8683f100f <Neal Richardson> Add content to vignette from blog post e6b75f4e0 <Neal Richardson> Consolidate and document reader/writer classes; also fix ARROW-6449 495abf663 <Neal Richardson> Fill in documentation and standardize file naming 5fd49ef4b <Neal Richardson> Fix check failures 96873e1cd <Neal Richardson> Factor out make_readable_file 3e4cfe71c <Neal Richardson> Clean up parquet classes and document the R6 85a8d3631 <Neal Richardson> Start vignette draft explaining the class and naming conventions 71cac57aa <Neal Richardson> Clean up Rd file names, experiment with documenting constructors, and start updating pkgdown 2d1b73875 <Neal Richardson> Replace table() with Table() b6945114f <Neal Richardson> Remove defunct Column class 730313e3a <Neal Richardson> One more find/replace, esp. RecordBatch* 702a0b162 <Neal Richardson> Message 365fedc4f <Neal Richardson> feather 0e7877b71 <Neal Richardson> Drop ::ipc:: 55607a6a8 <Neal Richardson> json 9bd708fd0 <Neal Richardson> csv fbebf2734 <Neal Richardson> io 1711d3e08 <Neal Richardson> CastOptions 12031adda <Neal Richardson> Backfill some methods 407589767 <Neal Richardson> compression 3b4b49218 <Neal Richardson> ChunkedArray bbf07993c <Neal Richardson> Buffer 3f1cd7184 <Neal Richardson> Object 9fbecda46 <Neal Richardson> A few more backticks 8edf08562 <Neal Richardson> Remove more backticks 1f6d154e4 <Neal Richardson> Replace array() with Array() 9f52490a0 <Neal Richardson> Progress commit renaming Array Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>