Commits

David Li authored e86beb86c17
ARROW-12608: [C++][Python][R] Add split_pattern_regex kernel This adds a split_pattern_regex kernel using RE2. Caveats: - RE2 requires us to wrap the user's regex in a capture group in order to actually get the matched delimiter. - Reverse splitting is not implemented - there's not a good way to do this with RE2. - In R, strsplit behaves differently - trailing empty splits are no longer dropped: ``` > df <- tibble(x = c("foo bar")) > (df %>% mutate(x = strsplit(x, "bar")) %>% collect())$x [[1]] [1] "foo " > (record_batch(df) %>% mutate(x = strsplit(x, "bar")) %>% collect())$x <list<character>[1]> [[1]] [1] "foo " "" ``` So the behavior here does not exactly match R. Though this was already the case: ``` > (df %>% mutate(x = strsplit(x, "bar", fixed = TRUE)) %>% collect())$x [[1]] [1] "foo " > (record_batch(df) %>% mutate(x = strsplit(x, "bar", fixed = TRUE)) %>% collect())$x <list<character>[1]> [[1]] [1] "foo " "" ``` Closes #10354 from lidavidm/arrow-12608 Authored-by: David Li <li.davidm96@gmail.com> Signed-off-by: David Li <li.davidm96@gmail.com>