My introductory post to the FreeBSD test suite sparked a comment in HackerNews that asked what exactly does it mean to test an operating system. I felt that was a great question worth answering for those not involved in the development of an OS and so here comes a possible answer.

Testing an OS may mean many different things depending on who you ask, all of which are valid and complementary. Some of the areas I can come up with:

  • User-space tools: This is probably the easiest of all. Implementing unit tests (and even integration tests) for the code of user-space tools and libraries is a relatively easy thing to do.
  • User-space tools that talk to the kernel: True, all user-space tools talk to the kernel. Otherwise, they'd be pure computation cores that have no data to compute or nor results to write. However, some interactions with the kernel can be overlooked for testing purposes. For example, when writing a test for ls(1), one can populate a test directory and use that as the test case for listings without having to worry about the backing file system internals. However, when testing ping(1), things get messier because you just cannot "talk to the network" if you want a reliable and fast test; instead, you need to come up with alternative mechanisms for testing that involve setting up a fake network — and that, my friend, can be a really tricky thing to do.
  • Feature regressions: While regression testing could be seen as a special case of unit and integration testing, I am listing it separately because this is particularly important for an OS. When fixing a bug, especially one that can make user-space applications or the kernel misbehave or crash, you really want to have a test that, first, exposes the problem and, second, prevents such bug from reoccurring in the future. In fact, this is the mentality behind the majority of tests that previously existed in the NetBSD test suite and is why the directory that contained them was appropriately named regress.
  • High-level kernel subsystems: This involves ensuring high-level subsystems within the kernel —like file systems, locking or the network stack— behave as intended. Such testing should be no different from the testing of user-space tools but it often is because the unnatural distinction between the kernel code and the user-space code. The Anykernel philosophy pushed by Antti Kantee is a great solution to this problem, but there some other approaches of doing so. There is also a lot of research in determining how to best test each of these subsystems because each may be subject to different risks in the face of failures.
  • Kernel drivers: Doing this is trickier than testing high-level kernel subsytems because actual hardware may be involved in the testing. Using a simulator only takes one so far because there is no guarantee that the simulator matches the hardware; in particular, because hardware is known to be buggy more often than not.
  • Performance targets and regressions: Lastly, another form of testing involves ensuring the system is performant. Such tests may be used to validate that an introduced optimization really behaves as expected but, also, should ensure that the system does not regress over time in critical areas (e.g. networking throughput for a particular workload).

From the above, you will notice that many areas worth of testing do not require any special features: any traditional tool for testing will do. However, the key difference when comparing an OS to a regular application regarding testing is one component and only one: the kernel. Anything having to do with the kernel, be it for test case preparation or just to actually exercise specific in-kernel features, is special in some form. (The fact that it is special is, again, a historical mistake worth fixing.)

The NetBSD Test Suite

The test suite for the NetBSD operating system currently contains tests for pretty much all areas mentioned above, covering many different components of the system. There are tests for many userland utilities, there are a lot of tests for in-kernel subsystems and there are some tests for specific bug reports.

The best selling point of this test suite is, probably, its ability to poke at kernel internals and test them directly by means of rump. This is something very specific to NetBSD.

The only area not covered by the test suite so far is performance. Doing performance tests is a vastly different world to running conformance tests and neither ATF nor Kyua are very well suited for the former.

The FreeBSD Test Suite

This, again, is a very new project. As a new project, the current code coverage of the test suite is very reduced. You should expect it to follow NetBSD's track regarding areas covered though, but it will take a while to catch up.

However, the major drawback in writing tests for FreeBSD is the lack of rump. Testing kernel-level code is pretty much impossible without such framework. Solving this is still an open-ended problem for which there has been no work.

Hope that clarifies the original question!