Automatic coordinated rebase with changeset evolution and Mercurial

02.03.2020 | Arne Babenhauserheide

The last article described different workflows for continuous integration that keep the master branch green. It investigated their cost as delayed visibility: how many commits you have to merge together before when integrating your work into master.

The strategy with the lowest cost is a coordinated backout and rebase, where the CI can remove commits that break the build. But when using Git, that requires lots of manually followed rebases and blacklisting of commits. To be a viable choice, the workflow would have to be automated and guaranteed. And that is hard with Git.

Enter Mercurial

Still, there is a way to automate the workflow, and it works seamlessly. But there’s a catch: You’ll have to switch to Mercurial with the evolve extension. The good thing is, Mercurial is not so different from Git (there are several command mapping articles), and if you already work with trunk-based development, then your life will get easier.

The evolve extension enables coordinated history rewriting. For Git there is the rule “do not force-push into a branch others may already pulled from”. If you do, you break the history of your collaborators and all of them have to rebase. The evolve extension automates such rebases (and more), so you can collaboratively rewrite history.

For setup, you need a shared repository set to non-publishing. Normally when you push a change to a remote server, Mercurial will mark the pushed commits as published and prevent you modifying them. That way you won’t have nasty surprises like accidentally rewriting a tagged commit (if you need to, use hg phase --draft --force --rev REV to allow rewriting temporarily).

The important pieces of the workflow are how the CI removes a commit for which tests failed (it prunes it), how developers follow that change, and how the one who pushed the bad commit fixes it.

The CI prunes a bad commit from history

We have a history with commits affe23 (bad) and fee42 (good). Commit fee42 is our HEAD, marked by the @.

# @ fee42
# | summary: fee42: good
# * affe23
# | summary: affe23: bad
# ...

Our CI (jenkins) tests commit affe23 and finds it bad, so it prunes it from history. Fee42 is rebased, so it gets a new revision number (ddd).

cd jenkins; hg prune -r affe23
# @ ddd
# | summary: fee42: good
# ...

A developer follows the change

A developer who wants to push the new commit abc pulls the bad changes and tries to push.

(cd dev; hg pull –rebase; hg push)

During the push the develover sees an error:

# abort: push includes orphan changeset: affe23
#     (use 'hg evolve' to get a stable history or --force to ignore warnings)
# @ abc
# | summary: abc
# * fee42
# | summary: fee42: good
# x affe23
# | summary: affe23: bad
# ...

Here x is an orphan, a commit that got removed. Commits following this orphan need to be rebased.

The developer resolves the error by using evolve to automatically follow the rebase. This removes the bad change from history. Then the developer pushes.

(cd dev; hg evolve –any; hg push)

Now commit affe23 is removed from history, and abc got rebased on top of the new fee42:

# @ ccc
# | summary: abc
# * ddd
# | summary: fee42: good
# ...

The bug is fixed

The author of affe23 receives an email by the CI with the failing commit:

jenkins: “build failed. Removed commit affe23. Please graft and fix it with hg graft --hidden affe23”

hg graft is like git cherry-pick:

(cd devX; hg graft --hidden affe23)

Now the changes from commit affe23 are back in history as the most recent commit, but with new commit hash (bbb).

The developer adds commit aaa that fixes the tests and pushes:

(cd devX; hg commit -m "fix affe23"; hg push)
# @ aaa
# | summary: fix affe23
# * bbb
# | summary: affe23: bad
# * ccc
# | summary: abc
# * ddd
# | summary: fee42: good
# ...

That’s it. We have a master which keeps itself green automatically and a workflow that only differs by two commands from regular trunk-based development.

Complete working example

To show that this is not pure theory, the following is a working example run that shows the complete workflow. To have a complete non-interactive script, sending emails is replaced by writing into the file failing_commits.log.

#!/usr/bin/env bash

set -x

echo "use deterministic output"
export HGPLAIN=1
export HGMERGE=internal:merge-other

echo "first cleanup the world"
rm -rf evolve/ dev1/ dev2/ jenkins/ changegrouphook.sh

echo "then get evolve, and caching it."
if [ ! -e evolve-cached.bundle ]; then
    hg clone https://www.mercurial-scm.org/repo/evolve/
    hg -R evolve bundle --all evolve-cached.bundle
else
    hg clone evolve-cached.bundle evolve
fi

export PYTHONPATH="$(realpath evolve)/hgext3rd"

echo "setup jenkins"
hg init jenkins
(cd jenkins; echo -e "[extensions]\nevolve =" >> .hg/hgrc)
cat > changegrouphook.sh <<EOF
    #!/bin/sh
hg log --template '{node}\n' -r $HG_NODE: >> unprocessed_commits.log
EOF
chmod +x changegrouphook.sh
(cd jenkins; echo -e "[hooks]\nchangegroup.run = ../changegrouphook.sh" >> .hg/hgrc)
(cd jenkins; echo -e "[phases]\npublish = False" >> .hg/hgrc)

echo "start the repo"
(cd jenkins; echo 1 > 1; hg ci -A -m "1"; hg phase --draft --force -r 0; hg phase)

echo "setup developer repos"
for i in dev1 dev2; do
    hg clone jenkins $i
    (cd $i; echo -e "[extensions]\nevolve =\nrebase =\n[ui]\nusername = $i\nmerge-tool = internal:merge-local[phase]publish = False" >> .hg/hgrc)
done



echo "basic interaction: dev1 works, dev1 pushes"
(cd dev1; echo abc > testfile; hg ci -A -m abc; hg push; hg log --graph)

echo "jenkins checks the changes, accepts them."
(cd jenkins; hg update; hg phase --public -r tip; rm unprocessed_commits.log)

echo "dev2 pulls, pulls with rebase (keeps changes), pushes"
(cd dev2; echo cde >> testfile; hg ci -A -m cde; hg pull --rebase; hg push; hg log --graph)

echo "dev1 does more work which won’t create conflicts and has to pull, but does not push yet"
(cd dev1; echo dev1-x >> unconflicted; hg ci -A -m dev1-x; hg log --graph; hg push || hg pull --rebase ; hg log --graph)

echo "jenkins checks changes: cde is bad"
echo "for this prototype let’s just assume that the commit failed to build"
(cd jenkins; for i in $(tac unprocessed_commits.log); do hg prune -r $i && (echo "to dev2: commit $i was contained in a failing build. It has been pruned. Please graft and fix it and push again." && echo $i >> ../dev2/failing_commits.log); hg evolve; done; hg log --graph)

echo "dev1 pushes, the push succeeds, because the local hg does not know about the pruning"
(cd dev1; hg push ; hg log --graph)

echo "dev2 does some more work and tries to push, but since the bad commits leading to the good change from dev1 were removed, this would push orphan changesets, so the push fails."
(cd dev2; echo efg > foo; hg ci -A -m efg; hg pull --rebase; hg push; hg log --graph)

echo "dev2 evolves the repo and only has the good changes left."
(cd dev2; hg evolve --any; hg push; hg log --graph)

echo "now dev2 receives the mails and grafts the commits to fix them."
(cd dev2; for i in $(tac failing_commits.log); do hg graft --tool internal:merge-local --hidden $i; done; hg log --graph)

echo "dev2 fixes the commit and pushes the grafted commit and the fix."
(cd dev2; echo abc > testfile; hg ci -A -m "fix: cde should be abc"; hg push; hg log --graph)

echo "jenkins checks the new changes and accepts them."
(cd jenkins; hg update; hg phase --public -r tip; rm unprocessed_commits.log; hg log --graph)

echo "local result: the jenkins repository is as if dev2 had pushed after dev1-x and had fixed the change before pushing"

echo "now dev1 pulls and gets the data about obsoletes"
echo "dev1 can fix this with evolve, or have evolve do it automatically as part of a rebase"
# (cd dev1; hg pull; hg evolve --any; hg log --graph)
(cd dev1; hg pull --rebase; hg log --graph)

echo "global result: now all developers see a history in which dev2 committed after dev1 and fixed the bad commit before pushing"

Our next steps

We now have an option for continuous integration, but a full migration from Git to Mercurial with a history of more than 120 000 commits, 30 developers, and Git-specific infrastructure (i.e. GitLab for which Mercurial support is added in the fork Heptapod) is a large undertaking.

Therefore we are currently starting to try out short-lived branches for merge-requests which are supported well in our GitLab, and several of our developers have been asking for shared branches for years. However we took out code-review from the branches, because delayed reviews slowed down integration which resulted in failures in merging which was too high a cost.

For the time being this article documents how to get an evergreen master without that cost.

The title image was published by Jens Lelie under the Unsplash License.

« Continuous integration with an evergreen master Size does matter: Area Cartograms in Cadenza »

Disy Tech-Blog