Commits


Dewey Dunnington authored and GitHub committed a75299d8cb2
GH-35649: [R] Always call `RecordBatchReader::ReadNext()` from DuckDB from the main R thread (#36307) ### Rationale for this change When passing a DuckDB result to Arrow via `to_arrow()` whose input was an Arrow dataset, calls to R code from other threads can occur in some DuckDB operations. This caused a crash or hang on Linux when attempting to combine `pivot_longer()` and `write_dataset()`. ### What changes are included in this PR? - Added a wrapper class around the `RecordBatchReader` that routes calls to `ReadNext()` through `SafeCallIntoR()`. ### Are these changes tested? I can't find a new case that isn't covered by our existing tests, although I did remove a skip that was causing a similar problem at one point (#33033). Because it's difficult to predict/test where duckdb evaluates R code, it's hard to know exactly what to test here (I would have expected R code to be evaluated/a crash to occur with many of our existing tests, but even the `pivot_longer()` example does not crash on MacOS and Windows 🤷 ). I did verify on Ubuntu 22.04 that the reprex kindly provided by @ PMassicotte errors before this PR and does not error after this PR: ```r library(tidyverse) library(arrow) one_level_tree <- tempfile() mtcars |> to_duckdb() |> pivot_longer(everything()) |> to_arrow() |> # collect() |> # collecting make it work, otherwise, it hangs on write_dataset() write_dataset(one_level_tree, partitioning = "name") list.files(one_level_tree, recursive = TRUE) ``` ### Are there any user-facing changes? There are no user facing changes. * Closes: #35649 Authored-by: Dewey Dunnington <dewey@voltrondata.com> Signed-off-by: Nic Crane <thisisnic@gmail.com>