Git - Julio Merino (jmmv.dev)

How to merge multiple Git repositories into one

Are you looking for a method to merge multiple Git repositories into a single one? If so, you have reached the right tutorial!

Please bear with me for a second while I provide you with background information and introduce the subject of our experiments. We’ll get to the actual procedure soon and you will be able to apply it to any repository of your choice.

In the Kyua project, and with the introduction of the kyua-atf-compat component in the Summer of 2012, I decided to create independent Git repositories for each component. The rationale was that, because each component would be shipped as a standalone distfile, they ought to live in their own repositories.

February 15, 2014 · Tags: featured, git, kyua
Continue reading (about 4 minutes)

Projects migrated to Git

I finally took the plunge. Yesterday night, I migrated the Kyua and Lutok repositories from Subversion to Git. And this morning I migrated ATF from Monotone and custom hosting to Git and Google Code; oh, and this took way longer than expected.

Migration of Kyua and Lutok

Migrating these two projects was straightforward. After preparing a fresh local Git repository following the instructions posted yesterday, pushing to Google Code is a simple matter:

$ git remote add googlecode https://code.google.com/p/your-project
$ git push googlecode --all
$ git push googlecode --tags

One of the nice things I discovered while doing this is that a Google Code project supports multiple repositories when the VCS system is set to Git or Mercurial. By default, the system creates the default and wiki repositories, but you can add more at will. This is understandable given that, in Subversion, you have the ability to check out individual directories of a project whereas you cannot do that in the other supported VCSs: you actually need different repositories to group different parts of the project.

I performed the full migration under a Linux box so that I could avail of the most recent Git version along with a proven binary package. The migration went alright, but I encountered a little problem when attempting a fresh checkout from NetBSD: git under NetBSD will not work correctly against SSL servers because it lacks the necessary CA certificates. The solution is to install the security/mozilla-rootcerts package and follow the instructions printed during installation; why this does not happen automatically escapes my mind.

Migration of ATF

I had been having doubts about migrating ATF itself, although if Kyua was moved to Git, it was a prerequisite to move ATF as well. Certainly I could convert the repository to Git, but where could I host it afterwards? Creating a new Google Code project just for this seemed too much of a hassle. My illumination came when I found out, as above, that Google Code supports an arbitrary amount of repositories in a particular project when converting it to Git.

So, for ATF, I just ran mtn git_export with appropriate flags, created a new atf repository on the Kyua site, and pushed the contents there. Along the way, I also decided to kill the home-grown ATF web site and replace it by a single page containing all the relevant information. At this point, ATF and Kyua are supposed to work together quite tightly (in the sense that ATF is just a "subcomponent" of Kyua), so coupling the two projects on the same site makes sense.

Now, let's drink the kool aid.

February 26, 2012 · Tags: atf, git, kyua, lutok
Continue reading (about 3 minutes)

Converting a Subversion repository to Git

As discussed over a week ago, I have been pondering the idea of migrating my projects from Subversion to Git. One of the prerequisites of such a migration is the preparation of a process to cleanly migrate the revision history from the old system to the new one. Of course, such process should attempt to preserve the revision history as close to reality as possible (regardless of what some other big projects have done by just throwing away their history; shrug).

The projects I am interested in migrating from Subversion to Git all live in Google Project Hosting. This is irrelevant for all but one detail I will mention later. However, the project hosting's site is the obvious place to look for help and, as expected, there is a little document titled Convert your project from Subversion to Git that provides a good start.

Summarizing, there are two well-suited tools to convert a Subversion repository to Git:

The built-in git svn command: works pretty well and has a good amount of customization features. The only little problem is that git svn does not recognize Subversion tags as such and therefore converts them to Git branches instead of tags.
The third-party svn2git: while this is actually built on top of git svn, it properly handles Subversion tags and converts them to native Git tags.

If you do not have tags in your Subversion repository or do not care about them being properly converted to Git tags, you can go with git svn. However, I am interested in properly recording tags, so I chose svn2git for my needs.

Now, using svn2git by itself as instructed in the referenced document will happily convert your Subversion repository to Git. You should not have to care about anything else... unless you want to polish a few details mentioned below.

Set up

In the rest of this article, I will place all the scripts for the conversion in a ~/convert/ directory. When the script is run, it will create a ./git/ subdirectory that will end up containing the new Git repository, and a ./control/ directory that will maintain temporary control files.

Disclaimer: Be aware that all the code below is quite ugly and probably bug-ridden. It is not meant to be pretty as I only have two use cases for it. Also please note that I am, by no means, a Git expert, so the process below may be way more convoluted than is actually necessary.

Oh, and this does not deal at all with the wiki changes. Google Code stores such information in the same Subversion repository as your code. However, when migrating to Git, you get two different repositories. Adapting the code below to deal with the wiki component of your project should be easy enough, although I'm not sure I care enough about the few contents I have in the wikis to go through the hassle.

Warning: The instructions below completely mess up tags if you happen to have any in your repository. I only discovered this after following the procedure below and pushing HEAD and the tags, which resulted in a different set of revisions pushed for every tag. When I actually did this, I ended up running the steps below, then discarding the tags, and creating the tags again by hand pointing to the appropriate revisions.

Setting up authors

The first step in using svn2git is defining a mapping between Subversion authors to Git authors. This is easy enough, but there is one caveat that affects projects hosted in Google Code: the root revision of your Subversion repository is not actually authored by you; its author is (no author) so you should account for that. Something like this in your ~/.svn2git/authors file will tae care of this teeny tiny detail:

your-address@example.net = Your Name 
(no author) = Your Name

However, as we will see below, we will be skipping the first revision of the repository altogether so this is actually not really necessary. I just felt like mentioning it for completeness, given that I really didn't expect (no author) to be properly recognized in this context.

References to old Subversion revision ids

Unless you have been extremely terse in your commit history and in your bug tracker, you will have plenty of cross-references pointing to Subversion revision identifiers. For example, if you are fixing a bug introduced a month ago in r123, you may as well point that out in the commit message of r321 for reference purposes. I guess you can see the problem now: if we migrate the history "as is", all these references become meaningless because the new Git history has no traces of the old revision identifiers.

The -m option to svn2git will annotate every revision with a git-svn-id line that contains the relevant information. However, such line is quite ugly because it is not really meant for human consumption: the information in such line is used by git-svn to allow pulls and pushes from a master Subversion repository.

What I want to do is reword the git-svn-id line to turn it into a real sentence rather than some internal control code. We can achieve this with git rebase in interactive mode: mark all revisions as "reword" and then go revision by revision fixing its message. A pain that can be automated: if an editor can do the necessary changes, we can create a fake "editor" script that performs the same modifications.

How? Store this as ~/convert/editor1.sh:

#! /bin/sh

if grep git-svn-id "${1}"; then
    # We are editing a commit message.
    new="This revision was r\1 in Subversion."
    sed -r -i -e "s,git-svn-id[^@]+@([0-9]+).*$,${new}," "${1}"
else
    # We are editing the git-rebase interactive control file.
    sed -i -e 's,^pick,reword,' "${1}"
fi

With this script, we can simply do EDITOR=~/convert/editor1.sh git rebase -i base-revision and get every revision tagged... but this will blow up if your repository contains empty revisions, which takes us to the next concern.

Drop empty revisions

As you probably know, Subversion supports attaching metadata to directories and files in the form of properties. These properties cannot be represented in Git, so, if you have any Subversion commits that touched properties alone, svn2git will happily convert those as empty Git commits.

There is nothing wrong with this, but things like git rebase will choke on these empty commits over and over again... and it gets quite annoying. Furthermore, these empty commits serve no purpose in Git because the changes they performed in Subversion make no sense in Git land. It is easier to just kill them all from the history.

The git rebase command above will abort on every empty revision it encounters. We can, at that point, record their identifiers for later deletion. However, recording the revision identifier will not work because, as we are doing a rebase, the identifier will have changed once we are done. Instead, and because I have been careful to write detailed commit messages, we can rely on the first line of the message (aka the subject) to identify every message. Rerun the rebase as follows, storing the list of empty commits in ../control/empty:

first=$(git log | grep '^commit' | tail -n 1 | cut -d ' ' -f 2)
EDITOR="${convert}/editor1.sh" 
    git rebase --interactive "${first}" || true
touch ../control/empty
while [ -f .git/MERGE_MSG ]; do
    head -n 1 .git/COMMIT_EDITMSG >>../control/empty
    EDITOR="${convert}/editor1.sh" git commit --allow-empty
    EDITOR="${convert}/editor1.sh" git rebase --continue || true
done

With this list in mind, we create another ~/convert/editor2.sh script to remove the empty revisions once we know them:

#! /bin/sh

echo "Empty revisions to be deleted:"
cat ../control/empty | while read line; do
    grep "${line}" "${1}"
    sed -i -e "/^pick ${line}$/s,^pick,fixup," "${1}"
done

Amend the root revision

The root revision in a Google Code subversion repository is empty: the system creates it to initialize the "directory layout" of your repository, but it serves no purpose. We can skip this root revision by passing the --revision=2 flag to svn2git.

However, no matter what we do, the git rebase above to tag Git revisions with their corresponding Subversion identifiers, will happily skip our first real revision and leave it untagged. We have to manually go and fix this, which is actually quite tricky. Luckily, this reply in stackoverflow provides the solution.

Putting it all together

Alright then. If all the above was dense and cryptic code-wise, it is the time to put it all together in a script that performs all the steps for us. Assuming you already have ~/convert/editor1.sh and ~/convert/editor2.sh in place, now create ~/convert/driver.sh as follows:

#! /bin/sh

set -e -x

[ ${#} -eq 1 ] || exit 1
convert=$(dirname ${0})
project=${1}

rm -rf git control
mkdir git control
cd git

# Start at revision 2 to skip the initial empty revision.
svn2git -v --revision=2 -m "http://${project}.googlecode.com/svn"

# Tag git revisions with the original Subversion revision id.
first=$(git log | grep '^commit' | tail -n 1 | cut -d ' ' -f 2)
EDITOR="${convert}/editor1.sh" 
    git rebase --interactive "${first}" || true
touch ../control/empty
while [ -f .git/MERGE_MSG ]; do
    head -n 1 .git/COMMIT_EDITMSG >>../control/empty
    EDITOR="${convert}/editor1.sh" git commit --allow-empty
    EDITOR="${convert}/editor1.sh" git rebase --continue || true
done

# Drop empty revisions recorded in the previous step.
# The list is in the 'empty' file and is read by editor2.sh.
EDITOR="${convert}/editor2.sh" 
    git rebase --interactive "${first}" || true

# Tag the root revision with the original Subversion revision id.
git tag root $(git rev-list HEAD | tail -1)
git checkout -b new-root root
EDITOR="${convert}/editor1.sh" git commit --amend
git checkout @{-1}
git rebase --onto new-root root
git branch -d new-root
git tag -d root

cd -
rm -rf control

Yuck. Well, it works, and it works nicely. Converting a Subversion repository to Git will all the constraints above is now as easy as: ~/convert/driver.sh your-project-name!

February 25, 2012 · Tags: git, google-code, subversion
Continue reading (about 8 minutes)

Switching projects to Git

The purpose of this post is to tell you the story of the Version Control System (VCS) choices I have made while maintaining my open source projects ATF, Kyua and Lutok. It also details where my thoughts are headed to these days.

This is not a description of centralized vs. distributed VCSs, and it does not intend to be one. This does not intend to compare Monotone to Git either, although you'll probably feel like it while reading the text. Note that I have fully known the advantages of DVCSs over centralized systems for many years, but for some reason or another I have been "forced" to use centralized systems on and off. The Subversion hiccup explained below is... well... regrettable, but it's all part of the story!

Hope you enjoy the read.

Looking back at Monotone (and ATF)

I still remember the moment I discovered Monotone in 2004: simply put, it blew my mind. It was clear to me that Distributed Version Control Systems (DVCSs) were going to be the future, and I eagerly adopted Monotone for my own projects. A year later, Git appeared and it took all the praise for DVCSs: developers all around started migrating en masse to Git, leaving behind other (D)VCSs. Many of these developers then went on to make Git usable (it certainly wasn't at first) and well-documented. (Note: I really dislike Git's origins... but I won't get into details; it has been many years since that happened.)

One of the projects in which I chose to use Monotone was ATF. That might have been a good choice at the time despite being very biased, but it has caused problems over time. These have been:

Difficulty to get Monotone installed: While most Linux distributions come with a Monotone binary package these days, it was not the case years ago. But even nowadays if all Linux distributions have binary packages, the main consumers of ATF are NetBSD users, and their only choice is to build their own binaries. This generates discomfort because there is a lot of FUD surrounding C++ and Boost.
High entry barrier to potential contributors: It is a fact that Monotone is not popular, which means that nobody is familiar with it. Monotone's CLI is very similar to CVS, and I'd say the knowledge transition for basic usage is trivial, but the process of cloning a remote project was really convoluted until "recently". The lack of binary packages, combined with complex instructions on just how to fetch the sources of a project only help in scaring people away.
Missing features: Despite years have passed, Monotone still lacks some important features that impact its usability. For example, to my knowledge, it's still not possible to do work-directory merges and, while the interactive merges offered by the tool seem like a cool idea, they are not really practical as you get no chance to validate the merge. It is also not possible, for example, to reference the parent commit of any given commit without looking at the parent's ID. (Yeah, yeah, in a DAG there may be more than one parent, but that's not the common case.) Or know what a push/pull operation is going to change on both sides of the connection. And key management and trust has been broken since day one and is still not fixed. Etc, etc, etc.
No hosting: None of the major project hosting sites support Monotone. While there are some playground hosting sites, they are toys. I have also maintained my own servers sometimes, but it's certainly inconvenient and annoying.
No tools support: Pretty much no major development tools support Monotone as a VCS backend. Consider Ohloh, your favorite bug tracking system or your editor/IDE. (I attempted to install Trac with some alpha plugin to add Monotone support and it was a huge mess.)
No more active development: This is the drop that spills the cup. The developers of Monotone that created the foundations of the project left years ago. While the rest of the developers did a good job in coming up with a 1.0 release by March 2011, nothing else has happened since then. To me, it looks like a dead project at this point :-(

Despite all this, I have been maintaining ATF in its Monotone repository, but I have felt the pain points above for years.

Furthermore, the few times some end user has approached ATF to offer some contribution, he has had tons of trouble getting a fresh checkout of the repository and given up. So staying with Monotone hurts the project more than it helps.

The adoption of Subversion (in Kyua)

To fix this mess, when I created the Kyua project two years ago, I decided to use Subversion instead of a DVCS. I knew upfront that it was a clear regression from a functionality point of view, but I was going to live with it. The rationale for this decision was to make the entry barrier to Kyua much lower by using off-the-shelf project hosting. And, because NetBSD developers use CVS (shrugh), choosing Subversion was a reasonable choice because of the workflow similarities to CVS and thus, supposedly, the low entry barrier.

Sincerely, the choice of Subversion has not fixed anything, and it has introduced its own trouble. Let's see why:

ATF continues to be hosted in a Monotone repository, and Kyua depends on ATF. You can spot the problem, can't you? It's a nightmare to check out all the dependencies of Kyua, using different tools, just to get the thing working.
As of today, Git is as popular, if not more, than Subversion. All the major operating systems have binary packages for Git and/or bundle Git in their base installation (hello, OS X!). Installing Git on NetBSD is arguably easier (at least faster!) than Subversion. Developers are used to Git. Or let me fix that: developers love Git.
Subversion gets on the way more than it helps; it really does once you have experienced what other VCSs have to offer. I currently maintain independent checkouts of the repository (appropriately named 1, 2 and 3) so that I can develop different patches on each before committing the changes. This gets old really quickly. Not to mention when I have to fly for hours, as being stuck without an internet connection and plain-old Subversion... is suboptimal. Disconnected operation is key.

The fact that Subversion is slowing down development, and the fact that it really does not help in getting new contributors more than Git would, make me feel it is time to say Subversion goodbye.

The migration to Git

At this point, I am seriously considering switching all of ATF, Lutok and Kyua to Git. No Mercurial, no Bazaar, no Fossil, no anything else. Git.

I am still not decided, and at this point all I am doing is toying around the migration process of the existing Monotone and Subversion repositories to Git while preserving as much of the history as possible. (It's not that hard, but there are a couple of details I want to sort out first.)

But why Git?

First and foremost, because it is the most popular DVCS. I really want to have the advantages of disconnected development back. (I have tried git-svn and svk and they don't make the cut.)
At work, I have been using Git for a while to cope with the "deficiencies" of the centralized VCS of choice. We use the squashing functionality intensively, and I find this invaluable to constantly and shamelessly commit incomplete/broken pieces of code that no-one will ever see. Not everything deserves being in the recorded history!
Related to the above, I've grown accustomed to keeping unnamed, private branches in my local copy of the repository. These branches needn't match the public repository. In Monotone, you had this functionality in the form of "multiple heads for a given branch", but this approach is not as flexible as named private branches.
Monotone is able to export a repository to Git, so the transition is easy for ATF. I have actually been doing this periodically so that Ohloh can gather stats for ATF.
Lutok and ATF are hosted in Google Code, and this hosting platform now supports Git out of the box.
No Mercurial? Mercurial looks a lot like Monotone, and it is indeed very tempting. However, the dependency on Python is not that appropriate in the NetBSD context. Git, without its documentation, builds very quickly and is lightweight enough. Plus, if I have to change my habits, I would rather go with Git given that the other open source projects I am interested in use Git.
No Bazaar? No, not that popular. And the fact that this is based on GNU arch makes me cringe.
No Fossil? This tool looks awesome and provides much more than DVCS functionality: think about distributed wiki and bug tracking; cool, huh? It also appears to be a strong contender in the current discussions of what system should NetBSD choose to replace CVS. However, it is a one-man effort, much like Monotone was. And few people are familiar with it, so Fossil wouldn't solve the issue of lowering the entry barrier. Choosing Fossil would mean repeating the same mistake as choosing Monotone.

So, while Git has its own deficiencies — e.g. I still don't like the fact that it is unable to record file moves (heuristics are not the same) — it seems like a very good choice. The truth is, it will ease development by a factor of a million (OK, maybe not that much) and, because the only person (right?) that currently cares about the upstream sources for any of these projects is me, nobody should be affected by the change.

The decision may seem a bit arbitrary given that the points above don't provide too much rationale to compare Git against the other alternatives. But if I want to migrate, I have to make a choice and this is the one that seems most reasonable.

Comments? Encouragements? Criticisms?

February 11, 2012 · Tags: atf, git, kyua, lutok, monotone, vcs
Continue reading (about 8 minutes)

Talk about Git

I've been using Git (or better said Cogito) recently as part of my PFC and, although I don't like the way Git was started, I must confess I like it a lot. In some ways it is very similar to Monotone (the version control system I prefer now) but it has its own features that make it very interesting. One of these is the difference between local and remote branches, something I'll talk about in a future post.

For now I would just like to point you to a talk about Git by Linus given at Google. He focuses more on general concepts of distributed version systems than on Git itself, so most of the ideas given there apply to many other systems as well. If you still don't see the advantages of distributed VCSs over centralized ones, you must watch this. Really. Oh, and it is quite "funny" too ;-)

May 19, 2007 · Tags: cogito, dvcs, git, monotone
Continue reading (about 1 minute)

Posts: Git

How to merge multiple Git repositories into one

Projects migrated to Git

Converting a Subversion repository to Git

Switching projects to Git

Talk about Git

Featured software

Featured posts