The GNU project is the source of the Unix userland utilities used on most Linux distributions. Its compatibility with standards and other Unix systems, or lack thereof, directly impacts the overall portability of any piece of software developed from GNU/Linux¹ installations.

Given that GNU/Linux has “triumphed” over pretty much every other Unix-like system, it is likely that you are developing from a GNU/Linux system, and it is also likely that you have no or little exposure to other Unix systems. That said, other Unix systems still exist to date, and they do not carry the GNU userland with them: macOS is the prominent example as it comes with a BSD core—and the various BSDs are still around and kicking.

Unfortunately, the GNU userland does not closely adhere to standards; even its own design guidelines encourage disregarding them. More so, if we look at the manual pages or Info documents of most GNU tools, we won’t see any mention as to which standards they conform to. For example, the GNU Coreutils manual contains 91 instances of the POSIX word, but skimming through the text revels that these are there to call out little incompatibilities or implementation choices.

Compare this to an arbitrary manual page from a BSD system, such as ls(1) from FreeBSD 13:

STANDARDS
     With the exception of options -g, -n and -o, the ls utility conforms to
     IEEE Std 1003.1-2001 ("POSIX.1") and IEEE Std 1003.1-2008 ("POSIX.1").
     The options -B, -D, -G, -I, -T, -U, -W, -Z, -b, -h, -w, -y and -, are
     non-standard extensions.

     The ACL support is compatible with IEEE Std 1003.2c ("POSIX.2c") Draft 17
     (withdrawn).

BUGS
     [...]

     IEEE Std 1003.2 ("POSIX.2") mandates opposite sort orders for files with
     the same timestamp when sorting with the -t option.

In the FreeBSD case, you are always aware of which behaviors are standard and which are not because the vast majority of its manual pages carry notes like these. This is important because you can then make an informed decision on whether you want to write software that is or isn’t standards-compliant. Or, at the very least, having this kind of information “in your face” all the time may make you aware that portability is still a real struggle in 2021.

Which brings us to the real problem with the GNU userland: GNU tools contain many incompatible extensions over other more-traditional Unix systems. We can classify these in two groups:

Usability extensions: These are extensions that are meant to improve the interactive experience of the system. These can actually be very useful, such as ls --color, and are perfectly fine extensions—as long as they are restricted to interactive usage alone.
Gratuitous incompatibilities: These are little differences here and there, like the == operator in test(1), that creep up in scripts and documentation and that have the (unintended?) consequence of making programs GNU-specific for no good reason.

It is the latter kind, the gratuitous incompatibilities, that are truly problematic because they are a source of vendor lock-in, intentionally or not². And because of GNU/Linux’s predominance, you are likely to be using these extensions already without knowing that they can—and will—cause trouble down the road.

In this post, I present the GNU-specific gratuitous incompatibilities that plague programs—shell scripts specifically—and that are easy to avoid. The goal of this post is to help you identify these pitfalls and write more portable software. So, if at all possible, I’d encourage you to use the standard counterparts, unless you need to use the more advanced GNU-specific features that have no simple replacements. The topics that I’ll cover are:

Most of the content here comes from my past experiences in porting lots of software (Gnome 2.x primarily) first created on GNU/Linux systems over to NetBSD. The title of this post, by the way, is in honor of the Useless use of cat meme, which documents unnecessary uses of the cat utility in favor of functionality that exists in other tools.

Command-line differences

One thing that GNU does well is handle command line arguments in a uniform way across all of its tools. Unfortunately, this is done via common libraries that contain gratuitous incompatible behaviors—which means that such incompatibilities spread throughout the whole system. Let’s peek at a few.

Flags after arguments

GNU tools use the getopt function to process options. This function is standard, but GNU added a major divergent behavior: the GNU implementation in libc accepts option arguments anywhere in the command line, not just before the first non-option argument.

This means that ls foo -l is a valid command invocation in GNU/Linux but is not valid anywhere else. The standards-compliant counterpart is trivial to achieve: move the flags to appear earlier, like in ls -l foo.

If you are writing your own tool that calls into getopt(3), you should detect whether you are using the GNU variant or not—and, if so, pass the + character as the first character of the options string. This will make the tool behave consistently across systems.

For further reference, see “The Problem With GNU getopt; Or, On Standards”.

Long flag names

In the previous section, I said that GNU tools use the getopt function to process options. This isn’t quite true though, as most of the GNU tools actually use getopt_long (a GNU addition) to support both short and long option names. The getopt_long(3) manual page of FreeBSD 13 says:

HISTORY
     The getopt_long() and getopt_long_only() functions first appeared in the
     GNU libiberty library.  The first BSD implementation of getopt_long()
     appeared in NetBSD 1.5, the first BSD implementation of
     getopt_long_only() in OpenBSD 3.3.  FreeBSD first included getopt_long()
     in FreeBSD 5.0, getopt_long_only() in FreeBSD 5.2.

Personally, I find long option names nice for usability and self-documenting command invocations. The problem with these options is that GNU tools expose standard-defined options both via their short, standard name, but also via long, GNU-specific names.

The reason this is a problem is because a call like grep --ignore-case foo is unnecessarily GNU-specific. The standards-equivalent call would be grep -i foo, and it’d do the exact same thing.

Note that I’m specifically talking here about standard options that are also exposed via a long name. For those options that enable GNU-specific behavior, such as ls --color, you can feel free to use the long name: the functionality doesn’t exist elsewhere so you can’t write the call in a portable way anyway. And, in fact, I’d suggest to use the long names in these cases because it’s more obvious, to the reader, that a GNU extension was explicitly used.

Bashisms

I was originally going to title this post “Useless use of Bash” but, in compiling content for the post, I found myself with more details than I expected to cover GNU as a whole.

Bash is now the de-facto shell interpreter on Unix platforms—which is fine because it’s a nice, user-friendly interactive shell. That said, Bash isn’t available everywhere by default: the BSDs don’t provide it out of the box; macOS is moving away due to licensing reasons; and Debian moved to dash for performance and standards conformance reasons. In other words: /bin/sh may not be Bash in mainstream platforms, and writing scripts that start with #! /bin/sh and that use Bash-specific features is bound to be problematic.

Unfortunately, Bash is so widespread that its gratuitous incompatibilities with the standards mean that a lot of shell scripts that could/should be portable are not. If you choose to make use of Bash’s advanced features, such as arrays and hashmaps, then you should mark your scripts as Bash-dependent by writing #! /bin/bash on top of them. But if you do not truly intend for your scripts to be Bash-specific, it’s best to avoid the little incompatibilities—also referred to as Bashisms.

And by the way, if you truly want to write complex shell scripts… review these sh tricks beforehand for some extra horrors. Maybe you’ll reconsider your choices.

The == operator

The test command, also known as [ (yes, really), uses the = operator for equality comparisons such as in [ a = b ]. The GNU version of this command also accepts == for equality comparisons. This is an unnecessary extension that has caused countless portability problems, especially in configure scripts, and that has no reason for existence. Given how trivial it is to avoid this problem, just refrain from using == in calls to [.

Note that, while I’m filing this item under the Bash section, this issue applies to both the test builtin in Bash and the test command supplied by GNU Coreutils. Both include the same extension.

The [[ and ]] extension

The test and [ commands have numerous limitations in how they handle arguments. Bash provides its own alternative, spelled out as [[, that is guaranteed to be in-process and that offers many goodies such as the ability to use < and > for comparisons.

This (being guaranteed to be built-in) is good in some cases, but using [[ is overkill in many others. For example: if you write [[ a == b ]] or [[ 3 < 4 ]], you are making your script unnecessarily unportable; you could have equally written [ a = b ] and [ 3 -lt 4 ] to obtain the same results while being portable.

That said, if you do have reasons to rely on [[ and ]] in your script at any point, I’d encourage you to be consistent throughout the code and use that syntax alone.

The function keyword

Bash lets you define functions using two different syntaxes: myfunc() { ... } and function myfunc() { ... }. These two do the same, but the function keyword is not standard. As a result, simply introducing it in your scripts makes them Bash-specific for no good reason. Avoid using the keyword altogether.

set -o pipefail

The shell’s “strict mode” is typically defined as set -euo pipefail and allows us to write more reliable shell scripts. Unfortunately, of those three options in the call, -o pipefail is not standard.

There is no good replacement for this option in the standard. Most scripts don’t need it anyway and only specify it because they cargo-culted the “strict mode” definition. But there are a few cases where scripts do rely on the behavior of -o pipefail for proper operation.

echo -e

The echo command (which is also both a built-in Bash command and a standalone tool supplied by GNU Coreutils) accepts the -e argument to enable the interpretation of escape sequences. This lets you do things like echo -e 'foo\n\nbar'. However, and as you may expect, this flag is not standard.

echo is ill-defined and dangerous, so instead of fighting it and trying to make it do the right thing across systems, consider using printf instead (yes, it is a command-line utility too!). printf 'foo\n\nbar\n' would do the same as above, portably and correctly.

find without a directory

The find command requires one or more directories to be specified before any filters, but the GNU version assumes the current directory if none are given. This means that a command like find -name '*.txt' works on GNU but fails on other systems. It’s trivial to fix such an invocation and make it portable by writing find . -name '*.txt', so just get in the habit of adding that missing dot!

awk gensub()

awk contains various functions to perform string replacements. If you look at the GNU variant of this tool, you will find three: sub, gsub, and gensub. Of these, gensub is the most powerful and, as you can imagine if you have made it this far, it is also the non-standard one.

Given that gensub does provide additional features of sub and gsub, there is no trivial replacement for it. However, if your code does not require the extra features offered by gensub, calling gensub would be a useless use of GNU.

make $(shell …) expansion

make is also a common source of compatibility problems. Relying on GNU Make is a fine thing to do given that the minimum common denominator for Makefiles is extremely limited feature-wise, but if your needs are very simple, there is something to watch out for from a compatibility perspective.

In GNU Make, you can use $(shell ...) (where ... is an arbitrary shell command) to execute an external command and incorporate its output into the script. This command is processed directly by GNU Make, which allows manipulating its output from within Make (e.g. in conditionals).

However, using this syntax in contexts where only the shell cares about the expanded value is unnecessary. Say you have the following:

LDFLAGS = -O2 $(shell pkg-config --libs zlib)

foo: foo.c
        cc $(LDFLAGS) -o foo foo.c

Note how the expansion of the pkg-config tool is only needed to invoke an external command: make itself doesn’t care about what that execution expands to. As a result, we can rewrite the above like this, delaying pkg-config’s execution until the shell runs, and thus avoid a useless use of GNU:

LDFLAGS = -O2 $$(pkg-config --libs zlib)

foo: foo.c
        cc $(LDFLAGS) -o foo foo.c

And that’s about it for now. I’m sure I missed other obvious incompatibilities that could be documented here. If you have any in mind, please let me know and I’ll extend the post so that future readers have more information.

Until then… continue writing code, but be aware that if you are working from a Linux system (which, by the way, includes WSL), the GNU userland will easily lock you in whenever you least expect it!

Thanks go to Wesley Moore, Perry E. Metzger and Jason Thorpe for reviewing a draft of this post and providing improvements.

Some people insist that Linux distributions be called GNU/Linux, not just Linux. The reason is that, strictly speaking, Linux is just a kernel and, when you are using a Linux system, you are primarily interacting with the GNU userland. While I sympathize with that idea, I’m not the one to adopt the GNU/Linux name, in part because most Linux distributions today ship with software developed by many more vendors than just GNU. In this post, however, I use the GNU/Linux term because saying Linux alone would be unfair to Linux. ↩︎
If the Embrace, Extend, and Extinguish catch phrase comes to mind… well… it seems fitting. ↩︎