Commits


Neal Richardson authored and Jonathan Keane committed 7eba11595c9
ARROW-13860: [R] arrow 5.0.0 write_parquet throws error writing grouped data.frame * `Table/RecordBatch$create()` on `grouped_df` no longer returns an `arrow_dplyr_query`, which was the change in the last release. This means these functions are type stable again, and this fixes the user report that write_parquet() doesn't work. * Instead of creating `arrow_dplyr_query`, group vars are stored in a special `.group_vars` attribute in the `metadata$r`. This attribute is used to restore groups on the round trip back to R, so `grouped_df %>% record_batch() %>% as.data.frame()` returns a `grouped_df` * The current dplyr release caches a lot of metadata about groups in a `grouped_df`, including all row indices matching each group value. This bloated the schema metadata we serialize, so it has been removed here. When converting back to a `grouped_df`/`data.frame`, dplyr will recreate this metadata. * The `group_vars()` and `ungroup()` methods for `ArrowTabular` read/write this new `metadata$r$attributes$.group_vars` field, so `df %>% group_by() %>% record_batch() %>% group_vars()` returns the same as `df %>% record_batch() %>% group_by() %>% group_vars()`. `arrow_dplyr_query()` also picks up on it. * New helper active binding `$r_metadata` to wrap the (de)serialization into the Arrow string KeyValueMetadata Closes #11315 from nealrichardson/fix-grouped-df Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Jonathan Keane <jkeane@gmail.com>