Commits

Neal Richardson authored 5227b78300c
ARROW-6793: [R] Arrow C++ binary packaging for Linux With this patch, `install.packages("arrow")` on Linux should now result in a functioning R package. On installation, the R `configure` script will, if Arrow C++ is not found locally, attempt to download a prebuilt static C++ library that corresponds to the local OS and the R package version; if not found, it will download the C++ source (or look for it in a local git checkout) and attempt to compile it. In this latter case, installation will be slow, but it should work--the script in `r/inst/build_arrow_static.sh` makes a bundled, static build. Any additional build-time dependencies (cmake, flex and bison for thrift) will be downloaded and built if necessary. If all of this fails, then the current "arrow without arrow" package is built. The `r` docker-compose service this patch adds is set up such that we can test against any Docker image containing R on Docker Hub. The GitHub Actions workflow and crossbow nightly tasks added here test the C++ source building on 7 distro/versions. This testing uncovered a few sharp corners in the bundled static cmake build, which have been noted/ticketed/fixed. I summarized the installation workflow and the ways to control and debug it in a vignette: `r/vignettes/install.Rmd`. That document also points to the code/scripts that do the work. To support this workflow and provide nightly builds that will prove that it works, I am separately setting up some CI at https://github.com/ursa-labs/arrow-r-nightly, where we already build nightly macOS and Windows binary packages. This CI * Builds static binaries for some set of distribution-versions (e.g. "ubuntu-18.04", "centos-7", etc.) and host them at `https://dl.bintray.com/ursa-labs/arrow-r/libarrow/bin/$DISTRO-$OS_VERSION/arrow-$PKG_VERSION.zip`. * Hosts C++ source snapshots corresponding to the R package versions (both CRAN releases and nightlies) at `https://dl.bintray.com/ursa-labs/arrow-r/libarrow/src/arrow-$PKG_VERSION.zip`. If not found for $PKG_VERSION, `configure` will then look to download the official Apache Arrow source release. This "fallback" will enable CRAN releases to work without the Ursa bintray, while having it first look to bintray will allow us to patch the source if necessary after the official Apache release, which has been necessary for past R releases (and permitted since R packages are not officially voted on). * Tests that those work. To be clear, the changes in this patch do not require the existence of that external build infrastructure. Without the nightly/binary builds, this patch still allows `R CMD INSTALL` from inside the git repository to Just Work, and future CRAN releases should also download the official Apache Arrow source release and build the C++ from source. So at least that gets our users a simple (albeit slow) installation experience, and one that works out of the box from a git checkout. The binary builds will improve the installation experience when they exist, and this patch includes the hooks to find and use those prebuilt libraries. Note that the intent is that this extra C++ installation/compilation will not happen on CRAN itself, only when users install the package themselves. We can relax that later once we have more confidence that the binaries we build will work widely, but restricting it on CRAN does not affect the experience of users who install the package. Aside: the original idea (discussed on Jira) of building a single "manylinux" Arrow C++ library to use with the R package proved not to work. Closes #6068 from nealrichardson/r-manylinux and squashes the following commits: ac7680c3f <Neal Richardson> ARROW_USE_PKG_CONFIG 7d32bc1af <Neal Richardson> cmake list(APPEND ...) cd04937f1 <Neal Richardson> :nail_care: 518d9774d <Neal Richardson> We're putting binaries under bin/ 00484c636 <Neal Richardson> Make thirdparty verbosity conditional fa5bdf12c <Neal Richardson> Swap out GHA/crossbow jobs because the old one caches better/is faster 29505fb60 <Neal Richardson> Copy-paste better 471463452 <Neal Richardson> Name GHA jobs to match the docker image name 0d3239583 <Neal Richardson> Fix revert mess ffa2eaf73 <Neal Richardson> Reorganize R CI jobs and wire new ones in crossbow 38bac3e6b <Neal Richardson> Fill in some GHA self references cf32c3ce9 <Neal Richardson> Remove docker from various docker-compose variable names 24f3bba25 <Neal Richardson> Revert "Skip brew test" (oops, wrong branch) 79e4793f2 <Neal Richardson> Fix for file move 692efc5d4 <Neal Richardson> Resolve some PR feedback 611024253 <Neal Richardson> Revert "Try downgrading m4" efef71e86 <Neal Richardson> Try downgrading m4 f8ffcfd17 <Neal Richardson> which ninja 4c5e5d510 <Neal Richardson> Better test fix/skip 93817563b <Neal Richardson> Skip these new tests too for ARROW-7500 83cb0802b <Neal Richardson> Remove accidental checkin of cpp zip file e0cd12870 <Neal Richardson> Oops again 7504e3dae <Neal Richardson> Oops. 26dee636f <Neal Richardson> Cleanup and update docs 5695a1cf8 <Neal Richardson> Re-skip fedora for now 11664d257 <Neal Richardson> See if downgrading flex solves the segfault 79b208bcb <Neal Richardson> Remove centos8, add fedora 1d284281d <Neal Richardson> temporarily skip hive test d52d353aa <Neal Richardson> Fix bison installation 75f2e60d2 <Neal Richardson> More debugging (curse you thrift) d6c81e8b2 <Neal Richardson> Debug m4 fc6cbf2ba <Neal Richardson> Try putting bison on path 801e20748 <Neal Richardson> Better detect OS for RSPM binaries. Build bison/m4 if necessary 68d232029 <Neal Richardson> Add R version to rstudio docker ce7ae24ff <Neal Richardson> Oops 908c3e9f5 <Neal Richardson> Add rstudio builds to matrix; generalize docker-compose job d156fb1cd <Neal Richardson> Docs 970e6cc5c <Neal Richardson> Build triage and start updating vignette f71034c3b <Neal Richardson> Oops fb27bf86e <Neal Richardson> Try to find R better 111fa069f <Neal Richardson> Some fixes from the build matrix f1672b57f <Neal Richardson> utils:: and let's not fail fast for now 364f63956 <Neal Richardson> Fix dockerhub name for centos image eaefdce5b <Neal Richardson> Add R jobs to GHA 801863bbe <Neal Richardson> :rat: b256c27d6 <Neal Richardson> Refactor linuxlibs.R and add r-hub builders to docker-compose 6f07ebf7a <Neal Richardson> More tempfile 07b21c5cf <Neal Richardson> Alphabetize cmake flags ac5eff60e <Neal Richardson> More iteration on linux build script: works on rhub/ubuntu-cpp-release 5e9382b31 <Neal Richardson> Download cmake if necessary 225902622 <Neal Richardson> Apply suggestions from @kou c323a1c80 <Neal Richardson> Prune some exploratory work 4838938d4 <Neal Richardson> Add vignette explaining how Linux installation works 52a04658f <Neal Richardson> Sketch out full logic of linux install script 655f8de27 <Neal Richardson> Allow r/configure to build cpp libs if they're found at ../cpp fb21b156d <Neal Richardson> Make sure the full test suite works ad632e66f <Neal Richardson> Restore build flags 184082a75 <Neal Richardson> Fix ninja detection aea397983 <Neal Richardson> Remove ninja from travis ae7571c6f <Neal Richardson> Don't fail without ninja be61f153f <Neal Richardson> Local debugging a347a44ec <Neal Richardson> Some debug ac2418654 <Neal Richardson> Experiment with calling install script from R package 02264e239 <Neal Richardson> Move build script to r/tools c51c81730 <Neal Richardson> Refine build_arrow_static.sh and add wiring in r/configure to consume the static lib e2c3238ce <Neal Richardson> Attempt to build static lib based on manylinux2014 Authored-by: Neal Richardson <neal.p.richardson@gmail.com> Signed-off-by: Neal Richardson <neal.p.richardson@gmail.com>