Unix - Julio Merino (jmmv.dev)

Hardware discovery: ACPI & Device Tree

If you grew up in the PC scene during the 1980s or early 1990s, you know how painful it was to get hardware to work. And if you did not witness that (lucky you) here is how it went: every piece of hardware in your PC—say a sound card or a network card—had physical switches or jumpers in it. These switches configured the card’s I/O address space, interrupts, and DMA ports, and you had to be careful to select values that did not overlap with other cards.

But that wasn’t all. Once you had configured the physical switches, you had to tell the operating system and/or software which specific cards you had and how you had configured them. Remember SET BLASTER=A220 I5 D1 H5? This DOS environment variable told programs which specific Sound Blaster you had installed and which I/O settings you had selected via its jumpers.

Not really fun. It was common to have hardware conflicts that yielded random lock-ups, and thus ISA “Plug and Play”, or PnP for short, was born in the early 1990s—a protocol for the legacy ISA bus to enumerate its devices and to configure their settings via software. Fast-forward to today’s scene where we just attach devices to external USB connectors and things “magically work”.

But how? How does the kernel know which physical devices exist and how does it know which of the many device drivers it contains can handle each device? Enter the world of hardware discovery.

February 28, 2025 · Tags: blogsystem5, hardware, unix
Continue reading (about 16 minutes)

ioctls from Rust

In Unix-like systems, “everything is a file and a file is defined as a byte stream you can open, read from, write to, and ultimately close”… right? Right? Well, not quite. It’s better to say file descriptors provide access to almost every system that the kernel provides, but not that they can all be manipulated with the same quartet of system calls, nor that they all behave as byte streams.

Because you see: network connections are manipulated via file descriptors indeed, but you don’t open them: you bind, listen/accept and/or connect to them. And then you don’t read from and write to network connections: you somehow send to and recv from them. Device drivers are similar: yes, hardware devices are represented as “virtual files” in the /dev hierarchy and many support read and write… but these two system calls are not sufficient to access the breath of functionality that the hardware drivers provide. No, you need ioctl.

ioctl is the poster child of the system call that breaks Unix’s “everything is a file” paradigm. ioctl is the API that allows out-of-band communication with the kernel side of an open file descriptor. To see cool examples, refer back to my previous article where I demonstrated how to drive graphics from the console without X11: in that post, we had to open the console device, but then we had to use ioctl to obtain the properties of the framebuffer, and then we had to mmap the device’s content for direct access: no reads nor writes involved.

All the code I showed you in that earlier post was written in C to keep the graphics article to-the-point, but the code I’m really working on is part of EndBASIC, and thus it is all Rust. And the thing is, ioctls are not easy to issue from Rust. In fact, after 7 years of Rust-ing, it’s the first time I’ve had to reach for unsafe code blocks, and there was no good documentation on how to deal with ioctl. So this posts aims to fix that by presenting what ways there are to call ioctls from Rust… and, of course, diving a bit deeper into what ioctls actually are.

February 13, 2025 · Tags: blogsystem5, rust, unix
Continue reading (about 14 minutes)

Self-documenting Makefiles

Make, as arcane as a build tool can be, may still be a good first fit for certain scenarios. “Heresy!”, you say, as you hear a so-called “Bazel expert” utter these words.

The specific problem I’m facing is that I need to glue together the NetBSD build system, a quilt patch set, EndBASIC’s Cargo-based Rust build, and a couple of QEMU invocations to produce a Frankenstein disk image for a Raspberry Pi. And the thing is: Make allows doing this sort of stitching with relative ease. Sure, Make is not the best option because the overall build performance is “meh” and because incremental builds are almost-impossible to get right… but adopting Bazel for this project would be an almost-infinite time sink.

Anyway. When using Make in this manner, you often end up with what’s essentially a “command dispatcher” and, over time, the number of commands grows and it’s hard to make sense of which one to use for what. Sure, you can write a README.md with instructions, but I guarantee you that the text will get out of sync faster than you can read this article. There is a better way, though.

January 10, 2025 · Tags: blogsystem5, unix
Continue reading (about 7 minutes)

Demystifying secure NFS

I recently got a Synology DS923+ for evaluation purposes which led me to setting up NFSv4 with Kerberos. I had done this about a year ago with FreeBSD as the host, and going through this process once again reminded me of how painful it is to secure an NFS connection.

You see, Samba is much easier to set up, but because NFS is the native file sharing protocol of Unix systems, I felt compelled to use it instead. However, if you opt for NFSv3 (the “easy default”), you are left with a system that has zero security: traffic travels unencrypted and unsigned, and the server trusts the client when the client asserts who is who. Madness for today’s standards. Yet, when you look around, people say “oh, but NFSv3 is fine if you trust the network!” But seriously, who trusts the network in this day and age?

You have to turn to NFSv4 and combine it with Kerberos for a secure file sharing option. And let me tell you: the experience of setting these up and getting things to work is horrible, and the documentation out there is terrible. Most documents are operating-system specific so they only tell you what works when a specific server and a specific client talk to each other. Other documents just assume, and thus omit, various important details of the configuration.

So. This article is my recollection of “lab notes” on how to set this whole thing up along with the necessary background to understand NFSv4 and Kerberos. My specific setup involes the Synology DS923+ as the NFSv4 server; Fedora, Debian, and FreeBSD clients; and the supporting KDC on a pfSense (or FreeBSD) box.

November 3, 2024 · Tags: blogsystem5, unix
Continue reading (about 25 minutes)

The costs of the i386 to x86-64 upgrade

If you read my previous article on DOS memory models, you may have dismissed everything I wrote as “legacy cruft from the 1990s that nobody cares about any longer”. After all, computers have evolved from sporting 8-bit processors to 64-bit processors and, on the way, the amount of memory that these computers can leverage has grown orders of magnitude: the 8086, a 16-bit machine with a 20-bit address space, could only use 1MB of memory while today’s 64-bit machines can theoretically access 16EB.

All of this growth has been in service of ever-growing programs. But… even if programs are now more sophisticated than they were before, do they all really require access to a 64-bit address space? Has the growth from 8 to 64 bits been a net positive in performance terms?

Let’s try to answer those questions to find some very surprising answers. But first, some theory.

October 7, 2024 · Tags: blogsystem5, hardware, unix
Continue reading (about 18 minutes)

Windows NT vs. Unix: A design comparison

Over the years, I’ve repeatedly heard that Windows NT is a very advanced operating system and, being a Unix person myself, it has bothered me to not know why. I’ve been meaning to answer this question for years and I can do so now, which means I want to present you my findings.

My desire to know about NT’s internals started in 2006 when I applied to the Google Summer of Code program to develop Boost.Process. I needed such a library for ATF, but I also saw the project as a chance to learn something about the Win32 API. This journey then continued in 2020 with me choosing to join Microsoft after a long stint at Google and me buying the Windows Internals 5th edition book in 2021 (which I never fully read due to its incredible detail and length). None of these made me learn what I wanted though: the ways in which NT fundamentally differs from Unix, if at all.

September 9, 2024 · Tags: blogsystem5, unix, windows
Continue reading (about 23 minutes)

Picking glibc versions at runtime

In a recent work discussion, I came across an argument that didn’t sound quite right. The claim was that we needed to set up containers in our developer machines in order to run tests against a modern glibc. The justifications were that using LD_LIBRARY_PATH to load a different glibc didn’t work and statically linking glibc wasn’t possible either.

But… running a program against a version of glibc that’s different from the one installed on the system seems like a pretty standard requirement, doesn’t it? Consider this: how do the developers of glibc test their changes? glibc has existed for much longer than containers have. And before containers existed, they surely weren’t testing glibc changes by installing modified versions of the library over the system-wide one and YOLOing it.

August 11, 2024 · Tags: blogsystem5, programming, unix
Continue reading (about 11 minutes)

SSH agent forwarding and tmux done right

The SSH agent is a little daemon that holds your private keys in memory. This is particularly handy when your keys are protected by a passphrase: you can unlock and add your keys to the agent once and, from then on, any SSH client such as ssh(1) can interact with the keys without asking you for the passphrase again.

The SSH agent becomes even handier when you primarily work on a remote workstation over SSH. Under these circumstances, you will often need the remote workstation to establish SSH connections to other remote machines (e.g. to contact GitHub). In those situations, you can: copy your private keys to the remote workstation; generate different private keys on the remote workstation; or forward your SSH agent so that the remote workstation can leverage the keys from your client machine without them ever traveling over the network.

November 17, 2023 · Tags: blogsystem5, unix
Continue reading (about 9 minutes)

Useless use of GNU

The GNU project is the source of the Unix userland utilities used on most Linux distributions. Its compatibility with standards and other Unix systems, or lack thereof, directly impacts the overall portability of any piece of software developed from GNU/Linux installations. Unfortunately, the GNU userland does not closely adhere to standards, and its widespread usage causes little incompatibilities to creep into any software created on GNU/Linux systems. Read on for why this is a problem and the pitfalls you will encounter.

August 25, 2021 · Tags: opinion, programming, shell, unix
Continue reading (about 12 minutes)

Argument processing in Unix and Windows

Let’s continue our dive into the very interesting topic of how Unix (or Linux or what have you) and Windows differ regarding argument processing. And by that I mean: how a program (the caller) communicates the set of arguments to pass to another program (the callee) at execution time, how the callee receives such arguments, and what are the consequences of each design.

November 2, 2020 · Tags: unix, windows
Continue reading (about 12 minutes)

Flags parsing in PowerShell (vs. Unix)

The way PowerShell handles flags in scripts (aka cmdlets) differs completely from what Unix shells do. These differences allow PowerShell to gain insight on how scripts have to be executed, which in turn can deliver a better interactive user experience. Read on for a comparison while wearing Unix-tinted glasses.

October 28, 2020 · Tags: powershell, unix, windows
Continue reading (about 7 minutes)

A tour of directories as system-wide databases

In the previous post, we saw how .d directories permit programmatic edits to system-wide configuration with ease. But this same concept can be applied to other kinds of tracking. Let’s dive into a few examples ranging from desktop menu entries to the package manager’s database itself.

August 21, 2020 · Tags: debian, menu2wm, netbsd, unix
Continue reading (about 12 minutes)

Configuration files and .d directories

Have you ever wondered why an increasing number of programs are configured by placing small files in .d directories instead of by just editing a single file? Have you ever wondered why these .d directories seem to proliferate in Linux installations? Read on to understand what these are and why they are useful.

August 17, 2020 · Tags: debian, featured, netbsd, unix
Continue reading (about 9 minutes)

Waiting for process groups, macOS edition

In the previous posts, we saw why waiting for a process group is complicated and we covered a specific, bullet-proof mechanism to accomplish this on Linux. Now is the time to investigate this same topic on macOS. Remember that the problem we are trying to solve (#10245) is the following: given a process group, wait for all of its processes to fully terminate.

macOS has a bunch of fancy features that other systems do not have, but process control is not among them. We do not have features like Linux’s child subreaper or PID namespaces to keep track of process groups. Therefore, we’ll have to roll our own. And the only way to do this is to scan the process table looking for processes with the desired process group identifier (PGID) and waiting until they are gone.

November 15, 2019 · Tags: bazel, darwin, macos, unix
Continue reading (about 8 minutes)

Waiting for process groups, Linux edition

In the previous post, we saw why waiting for a process group to terminate is important (at least in the context of Bazel), and we also saw why this is a difficult thing to do in a portable manner. So today, let’s dive into how to do this properly on a Linux system.

On Linux, we have two routes: using the child subreaper feature or using PID namespaces. We’ll focus on the former because that’s what we’ll use to fix (#10245) the process wrapper¹, and because they are sufficient to fully address our problem.

November 14, 2019 · Tags: bazel, linux, unix
Continue reading (about 4 minutes)

Waiting for process groups, introduction

Process groups are a feature of Unix systems to group related processes under a common identifier, known as the PGID. Using the PGID, one can look for these related process and send signals in unison to them. This is typically used by shell interpreters to manage processes.

For example, let’s launch a shell command that puts two sleep invocations in the background (those with the 10- and 20-second delays) and then sleeps the direct child (with a 5-second delay)—while also putting the whole invocation in the background so that we can inspect what’s going on:

November 12, 2019 · Tags: bazel, unix
Continue reading (about 6 minutes)

Bazel's process-wrapper helper tool

As strange as it may sound, a very important job of any build tool is to orchestrate the execution of lots of other programs—and Bazel is no exception.

Once Bazel has finished loading and analyzing the build graph, Bazel enters the execution phase. In this phase, the primary thing that Bazel does is walk the graph looking for actions to execute. Then, for each action, Bazel invokes its commands—things like compiler and linker invocations—as subprocesses. Under the default configuration¹, all of these commands run on your machine.

November 8, 2019 · Tags: bazel, internals, unix
Continue reading (about 6 minutes)

Safely restoring the previous working directory

The current working directory, or CWD for short, is a process-wide property. It is good practice to treat the CWD as read-only because it is essentially global state: if you change the CWD of your process at any point, any relative paths you might have stored in memory¹ will stop working. I learned this first many years ago when using the Boost.Filesystem library: I could not find a function to change the CWD and that was very much intentional for this reason.

September 21, 2019 · Tags: programming, unix
Continue reading (about 3 minutes)

#! /usr/bin/env considered harmful

Many programming guides recommend to begin scripts with the #! /usr/bin/env shebang in order to to automatically locate the necessary interpreter. For example, for a Python script you would use #! /usr/bin/env python, and then the saying goes, the script would “just work” on any machine with Python installed.

The reason for this recommendation is that /usr/bin/env python will search the PATH for a program called python and execute the first one found… and that usually works fine on one’s own machine.

September 14, 2016 · Tags: featured, portability, programming, scripts, unix
Continue reading (about 5 minutes)

set -e and set -x

If you write shell scripts, you definitely need to know about two nice features that can be enabled through the set builtin:

set -e: Enables checking of all commands. If a command exits with an error and the caller does not check such error, the script aborts immediately. Enabling this will make your scripts more robust. But don’t wait until your script is “complete” to set the flag as an afterthought, because it will be a nightmare to fix the scrip to work with this feature enabled. Just write set -e as the very first line of your code; well… after the shell bang.
January 24, 2010 · Tags: unix
Continue reading (about 1 minute)

Doesn't 'ls f*' do what you expect?

If you have ever ran ls on a directory whose contents don't fit on screen, you may have tried to list only a part of it by passing a wildcard to the command. For example, if you were only interested in all directory entries starting with an f, you might have tried ls f*. But did that do what you expected? Most likely not if any of those matching entries was a directory. In that case, you might have thought that ls was actually recursing into those directories.

Let's consider a directory with two entries: a file and a directory. It may look like:

$ ls -l
total 12K
drwxr-xr-x 2 jmmv jmmv 4096 Dec 19 15:18 foodir
-rw-r--r-- 1 jmmv jmmv    0 Dec 19 15:18 foofile

The ls command above was executed inside our directory, without arguments, hence it listed the current directory's contents. However, if we pass a wildcard we get more results than expected:

$ ls -l *
-rw-r--r-- 1 jmmv jmmv    0 Dec 19 15:18 foofile

foodir:
total 4K
-rw-r--r-- 1 jmmv jmmv 0 Dec 19 15:19 anotherfile

What happened in the previous command is that the shell expanded the wildcard; that is, ls never saw the special character itself. In fact, the above was internally converted to ls -l foofile foodir and this is what was actually passed to the ls utility during its execution. With this in mind, it is easy to see why you got the contents of the sample directory too: you explicitly (although somewhat "hidden") asked ls to show them.

How to avoid that? Use ls's -d option, which tells it to list the directory entries themselves, not their contents:

$ ls -l -d *
drwxr-xr-x 2 jmmv jmmv 4096 Dec 19 15:19 foodir
-rw-r--r-- 1 jmmv jmmv    0 Dec 19 15:18 foofile

Update (21st Dec): Fixed the first command shown as noted by Hubert Feyrer.

December 19, 2006 · Tags: unix
Continue reading (about 2 minutes)

Lightweight Web Serving With thttpd

This article first appeared on this date in O’Reilly’s ONLamp.com online publication. The content was deleted sometime in 2019 but I was lucky enough to find a copy in the WayBack Machine. I reformatted the text to fit the style of this site and fixed broken links, but otherwise the content is a verbatim reproduction of what was originally published.

The Apache HTTP Server is the most popular web server due to its functionality, stability, and maturity. However, this does not make it suitable for all uses: slow machines and embedded systems may have serious problems running it because of its size. Here is where lightweight HTTP servers come into play, as their low-memory footprints deliver decent results without having to swap data back to disk.

October 13, 2005 · Tags: featured, onlamp, unix, web
Continue reading (about 11 minutes)

Making Packager-Friendly Software (part 2)

My previous article, Making Packager-Friendly Software (part 1), explains why software packaging is sometimes problematic due to real problems in the mainstream sources. It also discusses many issues that affect the distribution files and the configuration scripts (the most visible items when trying out a new program). This part explores the problems found in the build infrastructure and the code itself.

April 28, 2005 · Tags: featured, netbsd, onlamp, programming, unix
Continue reading (about 15 minutes)

Making Packager-Friendly Software (part 1)

A package maintainer, or packager, is a person who creates packages for software projects. He eventually finds common problems in these projects, resulting in a complex packaging process and a final package that is a nightmare to maintain. These little flaws exist because in most cases the original developers are not packagers, so they are not aware of them. In other words, if you do not know something is wrong, you cannot fix it.

March 31, 2005 · Tags: featured, netbsd, onlamp, programming, unix
Continue reading (about 20 minutes)

Posts: Unix