A brief look into Fedora's packaging infrastructure

As you probably know, I have been a long-time “evangelist” of pkgsrc. I started contributing to this packaging system when I first tried NetBSD in 2001 by sending new packages for small tools, and I later became a very active contributor while maintaining the GNOME packages. My active involvement came to an end a few years ago when I switched to OS X, but I still maintain a few packages and use pkgsrc in my multiple machines: a couple of NetBSD systems and 3 OS X machines.

Anyway. pkgsrc is obviously not everything in this world, and if I realistically want other people to use my software, there have to be binary packages for more mainstream systems. Let’s face it: nobody in their sane mind is going to come over to my project pages, download the source package, mess around with dependencies that do not have binary packages either, and install the results. Supposedly, I would need just one such person, which by coincidence would also be a packager of a mainstream distribution, to go through all these hoops and create the corresponding binary packages. Ah yes, what I said: not gonna happen anytime soon.

Sooooo… I spent part of past week learning (again) how to create binary packages for Fedora and, to bring this into practice, I prepared an RPM of lutok and pushed it to Fedora rawhide and Fedora 16. All in all, it has been a very pleasant experience, and the whole point of this post is to provide a small summary of the things I have noticed. Because I know pretty well how pkgsrc behaves and what its major selling points are, I am going to provide some pkgsrc-related comments in the text below.

Please note that the text below lacks many details and that it may claim some facts that are not completely accurate. I’m still a novice in Fedora development land.

First, let’s start describing the basic components of a package definition:

spec file: The spec file of a package is RPM’s “source description” of how to build a package (pkgsrc’s Makefile) and also includes all the package’s metadata (pkgsrc’s PLIST, DESCR, etc.). This does not include patches nor the original sources. I must confess that having all the details in a single file is very convenient.
File lists: Contrary to (common?) misconception, spec files can and do have an explicit list of the files to be included in the package (same as pkgsrc’s PLIST). This list can include wildcards to make package maintenance easier (e.g. you can avoid having to list all files generated by Doxygen and just include the directory name, which will just do the right thing). No matter what you do, the build system will ensure that all files generated by the package are part of the file list to ensure that the package is complete.
SRPMs: Think about this as a tarball of all the files you need to build a particular package, including the spec file, the source tarball and any additional patches. These files are very convenient to move the source package around (e.g. to publish the package for review or to copy it to a different machine for rebuilding) and also to upload the package to Koji’s build system (see below).
Subpackages: Oh my… what a relief compared to pkgsrc’s approach to this. Creating multiple independent packages from a single spec file is trivial, to the point where providing subpackages is encouraged rather than being a hassle. For what is worth, I have always liked the idea of splitting development files from main packages (in the case of libraries), which in many cases helps in trimming down dependencies. pkgsrc fails miserably here: if you have ever attempted to split a package into subpackages to control the dependencies, you know what a pain the process is… and the results are a collection of unreadable Makefiles.

Now let’s talk a bit about guidelines and access control:

Policies: I was going to write about documentation in this point, but what I really wanted to talk about are policies. There are several policies governing packaging rules, and the important thing is that they are actually documented (rather than being tribal knowledge). The other nice thing is that their documentation is excellent; just take a moment to skim through the Packaging Guidelines page and you will see what I mean. The packaging committee is in charge of editing these policies whenever necessary.
Review process: Any new package must go through a peer review process. Having grown accustomed to Google’s policy of having every single change to the source tree peer-reviewed, I can’t stress how valuable this is. It may seem like a burden to newcomers, but really, it is definitely worth it. The review process is quite exhaustive, and from what I have seen so far, the reviewers tend to be nice and reasonable. As an example, take a look at lutok’s review.
Repository and ACLs: The source files that describe a package (mainly a spec file and a sources file) are stored in a Git repository (I believe there is a different repository for every package, but I may be wrong). This is nothing unusual, but the nice thing is that each package has its own read/write ACLs. New maintainers have access to their own packages only, which means that the barrier of entry can be lowered while resting assured that such contributors cannot harm the rest of the packages until they have gained enough trust. Of course, there are a set of trusted developers that can submit changes to any and every package.

“But you said packaging infrastructure in the title!”, you say. I know, I know, and this is what I wanted to talk most about, so here it goes:

Common tools: Other than the well-known rpm and yum utilities, developers have access to rpmbuild and fedpkg. rpmbuild would be rpm’s counterpart, in the sense that it is the lowest level of automation and exposes many details to the developer. fedpkg, on the other hand, is a nice wrapper around the whole packaging process (involving git, mock builds, etc.).
Koji: Koji is Fedora’s build system, ready to build packages for you on demand from a simple command-line or web interface. Koji can be used to test the build of packages during the development process on architectures that the developer does not have (the so-called “scratch builds”). However, Koji is mainly used to generate the final binary packages that are pushed into the distribution. Once the packager imports a new source package into the repository, he triggers the build of binary packages to include them later into the distribution.
Bodhi: Bodhi is Fedora’s update publishing system. When a packager creates a new version of a particular package and wishes to push such update to a formal release (say, Fedora 16), the update is first posted in Bodhi. Then, there are a set of scripts, rules and peer reviews that either approve the update for publication on the branch or not.

Let’s now talk a bit about pkgsrc’s waived strengths and how they compare to Fedora’s approach:

Mass fixes: In pkgsrc, whenever a developer wants to change the infrastructure, he can do the change himself and later adjust all existing packages to conform to the modification. In Fedora, because some particular developers have write access to all packages, it seems certainly possible to apply a major fix and/or rototill to all packages in the same manner as is done in pkgsrc. Such developer could also trigger a rebuilt of all affected packages using a specific branch for testing purposes and later ensure that the modified packages still work.
Isolated builds: buildlink3 is an awesome pkgsrc technology that isolates the build of a particular package from the rest of the system by means of symlinks and wrapper scripts. However, pkgsrc is not alone. Mock is Fedora’s alternative to this: Mock provides a mechanism to build packages in a chroot environment to generate deterministic packages. The tools used to generate the “blessed” binary packages for a distribution (aka Koji) use this system to ensure the packages are sane.
Bulk builds: This is a term very familiar to pkgsrc developers, so I’m just mentioning it en-passing because this is also doable in RPM-land. While package maintainers are responsible for building the binary packages of the software they maintain (through Koji), authorized users (e.g. release engineering) can trigger rebuilds of any or all packages.

And, lastly, let’s raise the few criticisms I have up to this point:

Lack of abstractions: spec files seem rather arcane compared to pkgsrc Makefiles when it comes to generalizing packaging concepts. What I mean by this is that spec files seem to duplicate lots of logic that would better be abstracted in the infrastructure itself. For example: if a package installs libraries, it is its responsibility to call ldconfig during installation and deinstallation. I have seen that some things that used to be needed in spec files a few years ago are now optional because they have moved into the infrastructure, but I believe there is much more that could be done. (RHEL backwards compatibility hurts here.) pkgsrc deals with these situations automatically depending on the platform, and extending the pkgsrc infrastructure to support more “corner cases” is easier.
No multi-OS support: One of the major selling points of pkgsrc is that it is cross-platform: it runs under multiple variants of BSD, Linux, OS X, and other obscure systems. It is true that RPM also works on all these systems, but Fedora’s packaging system (auxiliary build tools, policies, etc.) does not. There is not much more to say about this given that this is an obvious design choice of the developers.

To conclude: please keep in mind that the above is not intended to describe Fedora’s system as a better packaging system than pkgsrc. There are some good and bad things in each, and what you use will depend on your use case or operating system. What motivated me to write this post were just a few small things like Koji, Bodhi and subpackages, but I ended up writing much more to provide context and a more detailed comparison with pkgsrc. Now draw your own conclusions! ;-)

A brief look into Fedora's packaging infrastructure

Featured software

Featured posts