Commits

Krisztián Szűcs authored 7bc2b0f3579
ARROW-7101: [CI] Refactor docker-compose setup and use it with GitHub Actions ## Projecting ideas from ursabot ### Parametric docker images The images are better parameterized now, meaning that we can build more variant of the same service. Couple of examples: ```console UBUNTU=16.04 docker-compose build ubuntu-cpp ARCH=arm64v8 UBUNTU=18.04 docker-compose build ubuntu-cpp PYTHON=3.6 docker-compose build conda-python ARCH=arm32v7 PYTHON=3.6 PANDAS=0.25 docker-compose build conda-python-pandas ``` Each variant has it's own docker image following a string naming schema: `{org}/{arch}-{platform}-{platform-version}[[-{variant}-{variant-version}]..]:latest` ### Use *_build.sh and *_test.sh for each job The docker images provide the environment, and each language backend usually should implement two scripts, a `build.sh` and a `test.sh`. This way dependent build like the docker python, r or c glib are able to reuse the build script of the ancestor without running its tests. With small enough scripts, if the environment is properly set up even the non-docker builds should be reproducible locally. GitHub Actions support bash scripts across all three platforms, so we can reuse the same `*_build.sh` and `*_test.sh` scripts to execute the builds either in docker, on the CI or locally. ## Using GitHub Actions for running the builds Regardless of the CI we're going to choose, the isolation constraint of different platforms requires some sort of virtualisation. Currently linux (and windows, but I have not tried it yet) has lightweight containerisation, so we should keep the linux builds isolated in docker containers. The rest of the platforms (windows and macOS) should be executed on the CI system. GitHub Actions support all three major platforms, linux, windows and macOS. I've added cross platform builds for a couple of languages, like Rust, and Go, the rest are work in progress. ### Workflow A workflow should define all builds of a language, mostly because the path filters can be defined on workflow level. For example the python builds should be triggered if either a cpp/** or a python/** file changes which can be covered in the same workflow file. ## Feature parity with the current builds Reaching feature parity with all of the builds below is not a goal for this PR, the difficult ones should at least have a tracking JIRA ticket. ### Travis-CI - [x] **Lint, Release tests**: - `Lint / C++, Python, R, Rust, Docker, RAT` - `Dev / Source Release` - [x] **C++ unit tests w/ conda-forge toolchain, coverage**: without coverage - `C++ / AMD64 Conda C++` - [x] **Python 3.6 unit tests, conda-forge toolchain, coverage**: without coverage - `Python / AMD64 Conda Python 3.6` - [x] **[OS X] C++ w/ Xcode 9.3**: - `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3 - [x] **[OS X] Python w/ Xcode 9.3**: - `Python / AMD64 MacOS 10.14 Python 3`: with Xcode 10.3 - [x] **Java OpenJDK8 and OpenJDK11**: - `Java / AMD64 Debian Java JDK 8 Maven 3.5.2` - `Java / AMD64 Debian Java JDK 11 Maven 3.6.2` - [x] **Protocol / Flight Integration Tests**: - `Dev / Protocol Test` - [x] **NodeJS**: without running lint and coverage - `NodeJS / AMD64 Debian NodeJS 11` - [x] **C++ & GLib & Ruby w/ gcc 5.4**: - `C++ / AMD64 Debian 10 C++`: with GCC 8.3 - `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4 - `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4 - `C GLib / AMD64 Ubuntu 18.04 C GLib` - `Ruby / AMD64 Ubuntu 18.04 Ruby` - [x] **[OS X] C++ & GLib & Ruby w/ XCode 10.2 & Homebrew** - `C++ / AMD64 MacOS 10.14 C++`: with Xcode 10.3 - `C GLib / AMD64 MacOS 10.14 C Glib`: with Xcode 10.3 - `Ruby / AMD64 MacOS 10.14 Ruby`: with Xcode 10.3 - [x] **Go**: without coverage - `Go / AMD64 Debian Go 1.12` - [x] **R (with and without libarrow)**: - `R / AMD64 Conda R 3.6`: with libarrow - `R / AMD64 Ubuntu 18.04 R 3.6` with libarrow ### Appveyor - ~JOB=Build, GENERATOR=Ninja, CONFIGURATION=Release, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017~ - ~JOB=Toolchain, GENERATOR=Ninja, CONFIGURATION=Release, ARROW_S3=ON, ARROW_BUILD_FLIGHT=ON, ARROW_BUILD_GANDIVA=ON~ - ~JOB=Build_Debug, GENERATOR=Ninja, CONFIGURATION=Debug~ - ~JOB=MinGW32, MINGW_ARCH=i686, MINGW_PACKAGE_PREFIX=mingw-w64-i686, MINGW_PREFIX=c:\msys64\mingw32, MSYSTEM=MINGW32, USE_CLCACHE=false~ - ~JOB=MinGW64, MINGW_ARCH=x86_64, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64, MINGW_PREFIX=c:\msys64\mingw64, MSYSTEM=MINGW64, USE_CLCACHE=false~ - [x] **JOB=Rust, TARGET=x86_64-pc-windows-msvc, USE_CLCACHE=false**: - `Rust / AMD64 Windows 2019 Rust nightly-2019-09-25` - [x] **JOB=C#, APPVEYOR_BUILD_WORKER_IMAGE=Visual Studio 2017, USE_CLCACHE=false** - `C# / AMD64 Windows 2019 C# 2.2.103` - [x] **JOB=Go, MINGW_PACKAGE_PREFIX=mingw-w64-x86_64 ...**: - `Go / AMD64 Windows 2019 Go 1.12` - ~JOB=R with libarrow, USE_CLCACHE=false, TEST_R_WITH_ARROW=TRUE, RWINLIB_LOCAL=%APPVEYOR_BUILD_FOLDER%\libarrow.zip~ ### Github Actions - [x] **Windows MSVC C++ / Build (Visual Studio 16 2019)**: - `C++ / AMD64 Windows 2019 C++`: without tests - [x] **Windows MSVC C++ / Build (Visual Studio 15 2017)**: - `C++ / AMD64 Windows 2016 C++`: without tests - [x] **Linux docker-compose / Test (C++ w/ clang-7 & system packages)**: all have llvm for gandiva but the compiler is set to gcc - `C++ / AMD64 Debian 10 C++`: with GCC 8.3 - `C++ / AMD64 Ubuntu 16.04 C++`: with GCC 5.4 - `C++ / AMD64 Ubuntu 18.04 C++`: with GCC 7.4 - [x] **Linux docker-compose / Test (Rust)**: without rustfmt - `Rust / AMD64 Debian Rust nightly-2019-09-25` - [x] **Linux docker-compose / Test (Lint, Release tests)**: - `Lint / C++, Python, R, Rust, Docker, RAT` - `Dev / Source Release` ### Nightly Crossbow tests The packaging builds are out of the scope if this PR, but the nightly **dockerized test** task are in. Nightly tests: - [x] docker-r - [x] docker-r-conda - [x] docker-r-sanitizer - [x] docker-rust - [x] docker-cpp - [x] docker-cpp-cmake32 - [x] docker-cpp-release - [x] docker-cpp-static-only - [x] docker-c_glib - [x] docker-go - [x] docker-python-2.7 - [x] docker-python-3.6 - [x] docker-python-3.7 - [x] docker-python-2.7-nopandas - [x] docker-python-3.6-nopandas - [x] docker-java - [x] docker-js - [x] docker-docs - [x] docker-lint - [x] docker-iwyu: included in the lint - [x] docker-clang-format: included in the lint - [x] docker-pandas-master - [x] docker-dask-integration - [x] docker-hdfs-integration - [x] docker-spark-integration - [x] docker-turbodbc-integration # TODOs left: - [x] Fix the Apidoc generation for c_glib - [x] Fix the JNI test for Gandiva and ORC - [x] Test that crossbow tests are passing - ~Optionally restore the travis configuration to incrementally decommission old builds~ ## Follow-up JIRAs: - [Archery] Consider porting the docker tool of ursabot to archery - [Archery] Consider to use archery with or instead of the pre-commit hooks - [Archery] Create a wrapper script in archery for docker compose in order to run the containers with the host's user and group - [C++] GCC 5.4.0 has a compile errors, reproduce with UBUNTU=16.04 docker-compose run ubuntu-cpp - [C++][CI] Test the ported fuzzit integration image - [C++][CI] Turn off unnecessary features in the integration tests (spark/turbodbc/dask/hdfs) - [C++][CI] Revisit ASAN UBSAN settings in every C++ based image - [CI] Consider re-adding the removed debian testing image is removed - [Go][CI] Pre-install the go dependencies in the dockerfile using go get - [JS][CI] Pre-install the JS dependencies in the dockerfile - [Rust][CI] Pre-install the rust dependencies in the dockerfile - [Java][CI] Pre-install the java dependencies in the dockerfile - [Ruby][CI] Pre-install the ruby dependencies in the dockerfile and remove it from the test script - [C#][CI] Pre-install the C# dependencies in the dockerfile - [R][CI] Fix the r-sanitizer build https://issues.apache.org/jira/browse/ARROW-6957 - [GLIB][MacOS] Fail to execute lua examples (fails to load 'lgi.corelgilua51' despite that lgi is installed) - [C++][CMake] Automatically set ARROW_GANDIVA_PC_CXX_FLAGS for conda and OSX sdk (see cpp_build.sh) - [C++][CI] Hiveserver2 instegration test fails to connect to impala container - [CI][Spark] Support specific Spark version in the integration tet including latest - [JS][CI] Move nodejs linting from js_build.sh to archery - [Python][CI] create a docker image for python ASV benchmarks and fix the script - [CI] Find a short but related prefix for the env vars used for the docker-compose file to prevent collisions - [C#] the docker container fails to run because of the ubuntu host versions, see https://github.com/dotnet/core/issues/3509 - [C++][Windows] Enable more features on the windows GHA build - [Doc] document docker-compose usage in the developer sphinx guide - [CI][C++] Add .ccache to the docker-compose mounts - [Archery][CI] Refactor the ci/scripts to a sourceable bash functions or to archery directly - [C++][CI] Use scripts/util_coredump.sh to show automatic backtraces - [C++] Fix the hanging C++ tests in Windows 2019 - [CI] Ask INFRA to set up the DOCKERHUB_* secrets for GitHub actions - [C++][CI] Running Gandiva tests fails on Fedora: Reproduce with: `docker-compose run -e ARROW_GANDIVA=ON fedora-cpp` ``` Running gandiva-internals-test, redirecting output into /build/cpp/build/test-logs/gandiva-internals-test.txt (attempt 1/1) 1364 : CommandLine Error: Option 'x86-experimental-vector-widening-legalization' registered more than once! 1365 LLVM ERROR: inconsistency in registered CommandLine options 1366 /build/cpp/src/gandiva ``` - [JS][CI] NodeJS build fails on Github Actions Windows node ``` > NODE_NO_WARNINGS=1 gulp build # 'NODE_NO_WARNINGS' is not recognized as an internal or external command, # operable program or batch file. # npm ERR! code ELIFECYCLE # npm ERR! errno 1 # npm ERR! apache-arrow@1.0.0-SNAPSHOT build: `NODE_NO_WARNINGS=1 gulp build` # npm ERR! Exit status 1 # npm ERR! # npm ERR! Failed at the apache-arrow@1.0.0-SNAPSHOT build script. # npm ERR! This is probably not a problem with npm. There is likely additional logging output above. ``` Closes #5589 from kszucs/docker-refactor and squashes the following commits: 5105d12e6 <Krisztián Szűcs> Rename pull-request folder to dev_cron e9e9a7eec <Krisztián Szűcs> Use underscores for naming the workflow files a92c99d03 <Krisztián Szűcs> Disable hanging C++ tests on windows f158c89b5 <Krisztián Szűcs> Attempt to push from apache/arrow master; Don't push from crossbow tasks 0e1d470a1 <Krisztián Szűcs> Turn off ORC on macOS C++ test due to link error 258db5cff <Krisztián Szűcs> Only push docker images from apache/arrow repository acdfcf086 <Krisztián Szűcs> Remove ORC from the brewfile 5102b85b1 <Krisztián Szűcs> Fix nodeJS workflow 032d6a388 <Krisztián Szűcs> Turn off 2 python builds 7f15b97a8 <Krisztián Szűcs> Filter branches 48b8d128a <Krisztián Szűcs> Fix workflows 36ad9d297 <Krisztián Szűcs> Disable builds 0f603af0c <Krisztián Szűcs> master only and cron workflows 28cc2d78d <Krisztián Szűcs> Rename Java JNI workflow bcd8af7b7 <Krisztián Szűcs> Port the remaining travis utility scripts ed5688154 <Krisztián Szűcs> Usage comments; recommend installing pandas from the docs because of its removal from conda_env_python 3c8c023ce <Krisztián Szűcs> Use Arch in volumes; some comments; remove conda version 'latest' from the images 771b023a8 <Krisztián Szűcs> Cleanup files; separate JNI builds 97ff8a122 <Krisztián Szűcs> Push docker images only from master dc00b4297 <Krisztián Szűcs> Enable path filters e0e2e1f46 <Krisztián Szűcs> Fix pandas master build 3814e0828 <Krisztián Szűcs> Fix manylinux volumes c18edda70 <Krisztián Szűcs> Add CentOS version to the manylinux image names c8b9dd6b1 <Krisztián Szűcs> Missing --pyargs argument for the python test command 33e646981 <Krisztián Szűcs> Turn off gandiva and flight for the HDFS test b9c547889 <Krisztián Szűcs> Refactor docker-compose file and use it with github actions. Authored-by: Krisztián Szűcs <szucs.krisztian@gmail.com> Signed-off-by: Krisztián Szűcs <szucs.krisztian@gmail.com>