If you have followed Windows 10 at all during the last few years, you know that the Windows Subsystem for Linux, or WSL for short, is the hot topic among developers. You can finally run your Linux tooling on Windows as a first class citizen, which means you no longer have to learn PowerShell or, god forbid, suffer through the ancient CMD.EXE console.

Unfortunately, not everything is as rosy as it sounds. I now have to do development on Windows for Windows as part of my new role within Azure… and the fact that WSL continues to be separate from the native Windows environment shows. Even though I was quite hopeful, I cannot use WSL as my daily driver because I need to interact with “native” Windows tooling.

I believe things needn’t be this way, but with the recent push for WSL 2, I think that the potential of an alternate world is now gone. But what do I mean with this? For that, we must first understand the differences between WSL 1 and WSL 2 and how the push for WSL 2 may shut some interesting paths.

A blog on operating systems, programming languages, testing, build systems, my own software projects and even personal productivity. Specifics include FreeBSD, Linux, Rust, Bazel and EndBASIC.

0 subscribers

DISCLAIMER: I have zero insight on what’s going on within the WSL team or what their future plans are. This is purely my personal opinion based on what I have experienced as a user.

Review of the WSL 1 architecture

Let’s first take a look at WSL 1, and for that, we must look at what’s in the awkward name. Why was this feature named Windows subsystem… for Linux? Isn’t that backwards? This is not a subsystem in Linux to do anything Windows-related; it’s the other way around!

Well… you see, the name is technically correct when considering the design of the Windows NT kernel. From the Architecture of Windows NT page in the Wikipedia, we find (emphasis mine):

User mode in Windows NT is made of subsystems capable of passing I/O requests to the appropriate kernel mode device drivers by using the I/O manager. The user mode layer of Windows NT is made up of the “Environment subsystems”, which run applications written for many different types of operating systems, and the “Integral subsystem”, which operates system-specific functions on behalf of environment subsystems. The kernel mode stops user mode services and applications from accessing critical areas of the operating system that they should not have access to.

Windows NT was designed from the ground up to support running processes from a multitude of operating systems, and Win32 was “just” one of those environment subsystems. With these solid foundations, WSL 1 supplies a new environment subsystem, the Linux subsystem, to run Linux binaries atop the Windows NT kernel. Both the Win32 and Linux environment subsystems share the common integral subsystem.

Mumble jumbo. What does any of that actually mean?

Different system call “front-ends”—that’s what it means. A user-space process is a collection of binary instructions that the processor executes uninterruptedly (leaving interrupts aside). The operating system’s kernel is unaware of what the process is doing until the process issues a system call: at that point, the kernel regains control to perform an operation on behalf of the user, which can be something like reading a file or pausing for a few seconds.

The way a process issues system calls, and the semantics of those system calls, are specific to the operating system. For example, on old x86: opening a file on Win32 is system call number 17h invoked via INT 2Eh whereas opening a file on Linux is system call number 5h invoked via INT 80h.

But… conceptually, opening a file is opening a file, right? The fact that the system call numbers or the software interrupt numbers are different among them is not particularly interesting. And hereby lies the key design aspect of WSL 1: the Linux subsystem in the NT kernel is, simply put, an implementation of Linux’s system call layer in front of the NT kernel. These system calls later delegate to NT primitives, not Win32 calls. Which is important to repeat: there is no translation from Linux to Win32 system calls.

This is a feat of engineering considering how generally good support for Linux apps got to be under WSL 1 and the many ways in which NT internally differs from Unix, fork+exec being the eternal archenemy.

The true beauty of this design is that there is a single kernel running on the machine, and this kernel has a holistic view of all the processes beneath it. The kernel knows everything about the Win32 and Linux processes. And these processes all interact with unified resources, such a single networking stack, a single memory manager, and a single process scheduler.

The reasons for WSL 2

If WSL 1 is so cool, then why does WSL 2 exist? Two reasons:

WSL 1 has to, essentially, implement all of Linux’s kernel ABI, “bit by bit”. If there is a bug in that interface, the WSL 1 has to replicate it. And if there is a feature that is difficult to represent within the NT kernel, either the feature cannot be implemented or it needs extra kernel logic (and thus becomes slower).
Linux subsystem in WSL 1 has to abide by any “limitations” and inherent differences that exist between the NT kernel and the traditional Unix design. The most obvious one is the NTFS file system and its semantics, and how these differences harm performance of Linux binaries. Poor file system performance seems to be a common complaint in WSL 1.

WSL 2 “throws away” all of the Linux subsystem parts of the name and replaces everything with a full-blown (but very well-hidden and fast) virtual machine. The virtual machine then runs a proper Linux kernel, a proper Linux file system, and a proper Linux networking stack within it.

What this means is that the beauty of the WSL 1 design is gone: the Windows NT kernel doesn’t get to see anything that happens within the Linux world any more. All it knows is that there is a big black box that does “stuff” inside, and all it gets to see are the VMENTER and VMEXIT hook points for virtual machines and block-level read/write requests on a virtual disk. The NT kernel is now unaware of Linux processes and file accesses. Similarly, the Linux kernel is unaware of anything in NT land.

You can read about some more differences in the official documentation.

The lost potential

From the user’s point of view, WSL 2 feels strictly better: Linux apps now run much, much faster because they are not subject to awkward Linux system call “emulation” within the NT kernel. If using NTFS with Linux semantics is difficult, that’s no problem because the Linux environment now uses ext4 on a virtual disk. And support for Linux apps can be much more complete, because, well, WSL 2 is Linux: if you want FUSE, to name something, you got it.

But this comes at the cost of what WSL could have been:

Can you imagine how cool it would be if you could type ps or top within a WSL session and see Linux and Windows processes side-by-side, able to mutate their state with kill?
Can you imagine how cool it would be to manipulate Windows services from the WSL session?
Can you imagine how cool it would be if you could use ifconfig (wait, is that ip? 🙄) within a WSL session to inspect and modify the machine’s network interfaces?
Essentially, can you imagine doing all of your Windows system administration tasks from within WSL?

Although this never existed, I can well imagine such a world… and it’s one that only the WSL 1 design can provide. And the reason I can imagine this is because macOS gives you this model (albeit cheating because macOS is essentially Unix).

Which is what brings me to my frustration: even though I could install WSL on my development machine for Azure, there is nothing I can use it for. I still have to interact with the system via CMD.EXE because I have to deal with Windows-native processes and resources, and because the tooling I have to deal with is Windows-only.

The FAQ for WSL 2 claims that WSL 1 will not be abandoned, and if we abide by Microsoft’s backwards-compatibility guarantee, that may be true. But keeping WSL 1 running is a monumental effort due to the need to keep up with Linux changes. Regardless, I hope that this is the case and that WSL 1 continues to exist. Who knows, maybe the reason WSL 1 stays behind is to pursue this magical world I’m describing? 🤔

Addendum: Linux compatibility in the BSDs

I can’t finish this post without talking about the various BSDs. The BSDs, always trailing behind Linux and other commercial operating systems, have had binary-level compatibility for ages. The earliest I can find is Linux compatibility in the NetBSD kernel back in 1995. That’s 25 years ago, and 21 before WSL 1’s first debut.

And, heck, this isn’t limited to Linux. NetBSD has had support to emulate various different operating systems throughout the years. SVR4 support appeared in 1994 and, for a brief stint, NetBSD even had support for… 🥁… PE/COFF binaries—that’s right, Win32 binaries. So, in a way, NetBSD implemented the WSL 1 model in reverse: it let you run Win32 binaries atop the NetBSD kernel back in 2002.