Right after discussing where rc.d should live, it’s time to tackle a different but related pet peeve of mine: the location of the installed packages database. For this, I’m going to focus on the system I know best, pkgsrc, which keeps its database under /var/db/pkg/ by default. I think this location is wrong and the database should move to /usr/pkg/libdata/pkgdb/.

From a cursory look, it seems that FreeBSD’s and OpenBSD’s ports databases, as well as dpkg’s and rpm’s, are also affected by this “problem”—but I do not know enough about their internals to say with certainty.

Let’s see why placing the installed packages database under /var/ is suboptimal and why libdata is a good alternative.

UPDATE (11:30 EDT): As Jonathan Perkin points out, the pkgdb already moved under the installation prefix four years ago for all bootstrap builds. I never noticed because I had been overriding the default value all along 🤦‍♂️. That said, the “problem” still remains for native, non-bootstrapped NetBSD builds, so this article is still valid 😉

Background: What is /var for?

Every packaging system that you use, be it pkgsrc, ports, dpkg, rpm, or whatever, must maintain a database to track the metadata of all installed packages. What this database contains varies across systems, but in general it lists the installed packages, their textual description, the collection of files each provides and their checksums, etc.

If we look at the I-bet-unknown-yet-super-informative hier(7) manual page, we find the following descriptions in NetBSD:

HIER(7)                Miscellaneous Information Manual                HIER(7)

NAME
     hier - layout of file systems

DESCRIPTION
     [...]

     /var/      Multi-purpose log, temporary, transient, and spool files.

                [...]

                db/        Miscellaneous automatically generated system-
                           specific database files, and persistent files used
                           in the maintenance of third party software.

                           pkg          Default location for metadata related
                                        to third party software packages.  See
                                        pkg_add(1) for more details of the
                                        NetBSD Packages Collection, or pkgsrc.

If you ask me… just by looking at this, the singling out of pkg in this list feels like an ad-hoc addition to justify the decision of placing pkg under /var/db/. But that’s just a feeling, so let’s continue investigating.

If we look at this same manual page in Debian:

HIER(7)                  Linux Programmer's Manual                  HIER(7)

NAME
       hier - description of the filesystem hierarchy

DESCRIPTION
       [...]

       /var   This directory contains files which may change in size,  such
              as spool and log files.

       [...]

       /var/lib
              Variable state information for programs.

       [...]

       /var/lib/<pkgtool>
              Packaging support files (optional).

… we find a similar call-out for the package manager’s database under what seems to be an arbitrary location in /var/. Also note that the text lists <pkgtool> as a placeholder instead of explicitly mentioning the dpkg directory. That’s because this manual page is generic and is not specific to Debian; such is the mess of Linux distributions… but that’s a rant for another day. But this brings us to the Filesystem Hierarchy Standard (FHS), which is where this manual page comes from.

The FHS contains more information than the hier(7) manual page on what each part of the file system tree is for. Quoting FHS 3.0 section 5.1:

/var contains variable data files. This includes spool directories and files, administrative and logging data, and transient and temporary files.

Some portions of /var are not shareable between different systems. For instance, /var/log, /var/lock, and /var/run. Other portions may be shared, notably /var/mail, /var/cache/man, /var/cache/fonts, and /var/spool/news.

/var is specified here in order to make it possible to mount /usr read-only. Everything that once went into /usr that is written to during system operation (as opposed to installation and software maintenance) must be in /var.

Emphasis mine. That last part is key.

Why does pkgdb not belong in /var?

Based on the descriptions above, /var/ is a place for files that might change during the operation of the system: transient files, logs, caches, spools, databases, etc.

If you install a print server, it’s reasonable for the server to maintain all print jobs under /var/: the jobs are mutable data that vary with the users’ activity on the system, and they belong to the system with the attached printer. If you install a PostgreSQL server, it’s reasonable for the databases to be stored under /var/: their on-disk state depends directly on end-user activity and the files belong to the machine running the server.

But just because the package manager maintains a database of installed files does not mean that such database belongs in /var/. And unfortunately, I think we ended up in this situation simply because someone thought of this as a “database” and assumed “must be in /var/” based on its name, without giving it a second thought. And then this pattern stuck and spread across systems.

IMPORTANT: The key distinction between the packages database and an arbitrary database is that the packages database changes in unison with the files of the installed packages.

It might be easier to visualize this argument if we go back to the idea that /usr/ is a read-only tree that can be shared across machines and /var/ is the writable tree that is specific to each machine. Under this scenario, if we share /usr/, the list of packages available on each machine is exactly the same among them. Ergo the database should also be shared across those machines and therefore /var/ is the wrong location for it.

This argument only holds if the contents of the packages database are immutable unless we are adding, removing, or updating packages—these operations require write access to /usr/ anyway, so they might as well update the database if it were under that tree. I know that’s true pkgsrc but cannot say for other systems, hence why I won’t speculate much about them.

A different argument: because the contents of /var/ are supposed to be transient, it should be acceptable to lose almost all of them and still be left with a functional system. However, if we lose the package database, we will be left with a system that will be very difficult to operate. Yes, it’ll keep running, but good luck trying to convince the package manager to do anything.

Where should pkgdb live instead?

So, if /var/ isn’t the right place for the packages database, where is? Referring back to hier(7):

HIER(7)                Miscellaneous Information Manual                HIER(7)

NAME
     hier - layout of file systems

DESCRIPTION
     [...]

     /usr/      Contains the majority of the system utilities and files.

                [...]

                libdata/  Miscellaneous utility data files.

                [...]

                pkg/      Installed third-party software packages.

                          [...]

                          libdata/  Package data files.

                [...]

                share/    Architecture-independent files, mostly text.

libdata sounds like the right location: the database is a collection of data files but these are not necessarily plain text (so we can rule out share). Yes, I know: libdata is an oddity typically only seen in BSD systems and it’s not documented by Debian’s hier(7) nor the FHS, so /usr/lib/<pkgtool>/ would also be an acceptable location on Linux systems.

Based on this, I’ve been using /usr/pkg/libdata/pkgdb/ and /opt/pkg/libdata/pkgdb/ for ages in my pkgsrc deployments (the former for NetBSD and the latter for macOS) and never experienced an issue, although going against the defaults is cumbersome.

How can you make it happen?

Convincing pkgsrc to use a different location for the pkgdb is easy. The first thing to do is tell the pkg_install(8) tools where the database should be. You can do this by defining the PKG_DBDIR variable in /etc/pkg_install.conf:

netbsd:~> cat /etc/pkg_install.conf
PKG_DBDIR=/usr/pkg/libdata/pkgdb
netbsd:~>

But this is not enough. Some packages—particularly packaging tools in the pkgtools category—end up hardcoding the value of PKG_DBDIR during their build and they might not recognize pkg_install.conf at runtime. Therefore, it’s safest for you to also add the same PKG_DBDIR setting mentioned above to /etc/mk.conf so that any custom-built packages pick it up. And yes, this means that using binaries built with a different PKG_DBDIR than the one you configure in /etc/pkg_install.conf can lead to subtle problems.

If you are bootstrapping pkgsrc using the bootstrap script, know that the script already defaults to putting the packages database somewhere under the prefix (but not under libdata). If you still want to change the location, pass the --pkgdbdir /usr/pkg/libdata/pkgdb flag to the script. Obviously adjust /usr/pkg/ to match the prefix under which you are bootstrapping.

And if you happen to be using pkg_comp for package compilation, then simply set PKG_DBDIR once in the default.conf configuration file and pkg_comp will know how to handle everything for you.

But ideally… we would fix the default setting in pkgsrc to be more sensible, like it was done in boostrap. If I recall correctly, I mentioned this at some point years ago in the mailing lists and got positive feedback, so maybe this is feasible after all without a ton of pain. But I’m not sufficiently involved in pkgsrc’s development these days to effect this change. Want to give it a try? 😉

Want more posts like this one? Take a moment to subscribe!

Enjoyed this article? Spread the word or join the ongoing discussion!