Commits


lafiona authored and GitHub committed 8861c0c8b2a
ARROW-15691: [Dev] Update archery to work with either master or main as default branch (#14033) # Overview The goal of this pull request is to update `archery` to work with a repository default branch named `master` or `main`, as part of the effort to rename the Apache Arrow repository's default branch to `main`. The parent Jira ticket can be found [here](https://issues.apache.org/jira/browse/ARROW-15689). # Implementation - Update the language of the top level `archery`, `crossbow`, and `docker` command line interface code to reference the mainline development branch (default git branch) generically. - Update comments that reference the `master` branch. - Update the `crossbow` benchmarking examples to generically specify the `<default-branch>` rather than a hard-coded value. - In `.github/workflows/integration.yml`, add an environment variable `DEFAULT_BRANCH` to the `archery` command in the "Execute Docker Build" step, so that `archery` can reliably access the default branch value. - In `.github/workflows/archery.yml`, add an environment variable `DEFAULT_BRANCH` for all steps. This environment variable was already used by the `Git Fixup` step. It will also be used by the `Archery Unittests` step. - Add a property, `default_branch_name`, to the `Repo` class in `dev/archery/archery/crossbow/core.py` for computing the default branch name. - If specified, the `DEFAULT_BRANCH` environment variable, takes precedent in determining the default branch name (this is for overriding the git-based heuristic and qualifying in CI). - Otherwise, `pygit2` is used to get the default branch name via the Apache Arrow repository's `origin` remote `HEAD` reference. This is a heuristic, but in most cases, the `HEAD` reference of the remote points to the default branch. - Add a cached property, default_branch to the `Release` class in `dev/archery/archery/release/core.py` for computing the default branch name. Similar to the `default_branch_name` property for `Repo` in `archery/archery/crossbow/core.py`: - If specified, the `DEFAULT_BRANCH` environment variable, takes precedent in determining the default branch name (this is for qualifying in CI). - Otherwise, similar to the previous step,`GitPython` is used to get the default branch name via the Apache Arrow repository's `origin` remote `HEAD` reference. - Modify the `PANDAS` and `DASK` Docker Build Parameter value for indicating the upstream development branch to `upstream_devel`. - Updated Development Running Docker Builds documentation to reflect the above change and fixed a broken link. #### Out of scope: - There are remaining instances of `master` in the test fixtures files in `dev/archery/archery/test/fixtures`. It appears that the data only refers to external repositories, such as `ursa-labs/ursabot`, which currently uses `master`, so these instances were not modified. # Testing - Ran the `archery` and `crossbow` commands in local clones of both the `mathworks/arrow` and `apache/arrow` repositories. - Confirmed that the GitHub CI jobs pass. - We are unsure how to locally qualify the changes to the `release` component, but the `release` tests pass in CI. - Ran a sample `archery docker` command after setting the `PANDAS` environment variable to confirm that the correct version of Pandas is used. # Future Directions 1. Added Jira task to update the pull request merge script to work with both `master` and `main` ([ARROW-17777](https://issues.apache.org/jira/browse/ARROW-17777)) 2. Added Jira task to update the default value for the default branch name, if it cannot be determined via an environment variable, `ARCHERY_DEFAULT_BRANCH`, or from the repository's remote head reference. ([ARROW-18011](https://issues.apache.org/jira/browse/ARROW-18011)) # Notes Thank you @ kevingurney for your help with this pull request! Lead-authored-by: Fiona La <fionala7@gmail.com> Co-authored-by: Kevin Gurney <kgurney@mathworks.com> Signed-off-by: Antoine Pitrou <antoine@python.org>