Commits

Neal Richardson authored 903bb8c9ff7
ARROW-6978: [R] Add bindings for sum and mean compute kernels Was curious how easy this would be and what issues I'd run into. So far I've done `sum()` and `mean()`, and I hit some roadblocks in doing min/max. Issues: * See workarounds in [sum.Array](https://github.com/apache/arrow/compare/master...nealrichardson:r-sum?expand=1#diff-695287341c6d5011a12d7d9bd3ae07adR350-R357). (1) missing/null values are always dropped, while R provides an option for how to treat missingness. Interestingly, it looks like min and max do have this null option supported (in `MinMaxOptions`) but sum and mean do not (yet).([ARROW-9054](https://issues.apache.org/jira/browse/ARROW-9054)) * (2) There is no sum method for boolean type ([ARROW-9055](https://issues.apache.org/jira/browse/ARROW-9055)) * There is no sum implemented for Scalars, which may be fine (I would expect it to be no-op for numeric types, just thought logically it maybe shouldn't error) ([ARROW-9056](https://issues.apache.org/jira/browse/ARROW-9056)) * For min and max, the path is to call the minmax kernel, which returns a struct containing min and max fields, and then extract the appropriate field from the struct. However, StructScalar doesn't seem to have field accessing methods like StructArray has ([ARROW-8769](https://issues.apache.org/jira/browse/ARROW-8769)) * I also ran into the fact that MakeArrayFromScalar can't handle structs (previously ticketed as [ARROW-6604](https://issues.apache.org/jira/browse/ARROW-6604)), which means I can't convert a ScalarStruct to an R object (data.frame). Not required for these kernel bindings though if ARROW-9070 happens. Also: * Summing integers seems to promote to return int64 if given int32 (I didn't try with smaller ints), even when overflow is not a danger (I was adding numbers 1 to 5). It would be nice if it returned the same type it got unless it has to go bigger to avoid overflow. * Given how flexible it is to use CallFunction on a new function without having to write any additional C++ binding, it was surprising/frustrating that in order to call minmax, I had to first add boilerplate to instantiate default MinMaxOptions https://github.com/apache/arrow/pull/7308/files#diff-ffc5e6f7dfee7f9bed7733b7fb41e632R163-R167. I would expect that kernels would be able to instantiate their own default options if called with no options. ([ARROW-9091](https://issues.apache.org/jira/browse/ARROW-9091)) Closes #7308 from nealrichardson/r-sum Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>