“Fast machines, slow machines”… ah, the post that spawned these series. As I frantically typed that article while replying to angry tweets, the thought came to mind: software engineering as a whole is hyper-focused on lowering the costs to write new code, yet there is a disregard for the costs that these improvements bring to other disciplines in a company on even to end users.
So, in this series finale, I want to compare how some choices that apparently lower development costs actually increase costs elsewhere. I also want to highlight how, if we made different decisions during development, we could possibly expose those extra costs early on. This is beneficial because exposing costs upfront allows us to make tough choices when there is still a chance of changing course.
To make things specific, I will look at how the use of modern frameworks that facilitate development can end up hurting performance, reliability, and usability. So let’s start with a three-part rant first (sorry) and then let’s look at what we might do.
A blog on operating systems, programming languages, testing, build systems, my own software projects and even personal productivity. Specifics include FreeBSD, Linux, Rust, Bazel and EndBASIC.
First, we have performance problems caused by the layers upon layers of leaky abstractions that frameworks add.
Every layer of abstraction that we add to a piece of software hurts performance: each layer adds code and, with very few exceptions, more code requires more cycles to build and run. As an example, think about the ever-increasing network round trips that the simplest interaction with an app has to perform and how poorly these degrade with bad network conditions. Or think about how slowly a run-of-the-mill Electron app starts and how much disk space it takes.
Sadly, this piling of abstraction layers has been happening for years. One possible rationale: the benefits of adding one more layer look great on paper and the incremental cost of such layer tends to be small, so the cost is easy to justify. It makes sense every time. Unfortunately, when many of these seemingly-small costs compound, systems become sluggish.
“But the developers saved some time!” I hear… while these savings in coding costs transform into everyone else requiring more powerful machines over time. End users have to upgrade their phones and laptops periodically just to keep up with the software bloat treadmill, and what they can do with their newer hardware isn’t massively different from what they could do with the iteration that came right before.
Plus these slowdowns impact production servers as well, not just end users, and the extra costs in the datacenter are orders of magnitude larger than what a single user will experience. I’m still shocked by how, for example, Google has insanely-fast internal infrastructure… yet those incredible systems exist to support huge binaries and highly-coupled micro-services that maybe shouldn’t have existed in their current form. For example, we did have discussions in the Bazel team about adding limits to what a build should support, and we did start measuring costs to try to address those… but it was too late to tame the beast.
Second, we have “DevOps problems” caused by the “easy-to-use” frameworks and their tooling.
The fact that writing code is easier than before does not necessarily mean that deploying and maintaining the resulting systems is easier too. In fact, the opposite tends to happen: these days it sounds inconceivable to launch a service on just one machine, while it was the norm not so long ago. “How will it scale to billions of users? How will it have 100 9s of reliability?” everyone asks, without facing the reality that scaling needs may never arise or that occasional downtime is acceptable.
Instead, we adopt languages with complex runtimes and fragile and dog-slow tooling, and we push micro-service architectures from the get go. We end up with systems that require cluster orchestrators like Kubernetes, distributed storage, messaging queues, complex monitoring systems, containers… or, in other words, a myriad of dependencies, each needing a different language runtime, deployment practices, and operational checklists. Running these systems now requires multiple large SRE teams.
Paradoxically, I would even say that the risk of downtime in these often-over-engineered systems is higher than the simpler alternatives. Operating a single machine exposed the cost of needing reliable hardware, power, and a few sysadmins, while operating large distributed systems hides such cost behind “unavoidable” cloud bills, confusing reporting structures, and a bunch of poorly-run support rotations. But hey, these problems are so detached from the initial coding activities—and sometimes from the developers themselves!—that it’s hard to think about the consequences of favoring certain languages or frameworks.
And third, we have extra usability costs caused by unification where unification wasn’t asked for.
The obvious example here is the push towards single codebases that can run on the web, iOS, and Android, ranging from large wide-screen monitors to tiny portrait phone screens. Developers rejoice in their ability to share code—they can ship faster!—but… are users happy? Apps are now their own silos that behave differently from all others and don’t integrate with the platforms their run on. “Too much whitespace” is a common cry.
Now, don’t get me wrong. I am a developer too, and of course I like frameworks that allow me to avoid code duplication. In fact, code duplication is a problem from a usability perspective too because bugs and features will differ in different versions of the same app. But why should we, the users, pay for a loss of platform uniformity and usability so that companies can ship a product faster?
Anyhow, enough for the rant.
What can we do about this? I’m not sure if there is much we can do. The incentives just aren’t there as Luke Plant claims in “No one actually wants simplicity”. And even if we could do something, we may not be able to like Yossi Kreinin describes in “Don’t ask if a monorepo is good for you—ask if you’re good enough for a monorepo”.
But here is the thing: it is good that building prototypes for new apps and features is cheaper and faster than ever before. Companies can quickly try and validate new products and features. Solo developers can launch apps in just a few days and have them reach thousands or millions of people. Yet… do the benefits really last? These initial cost-saving measures end up hiding bigger costs down the road. Initial prototypes are never thrown away in favor of a rewrite—as everyone says you should really do—and once the ball of mud grows, it’s too expensive and too late to tame it.
Another problem is that most engineers haven’t done any performance work. It is common, based on my observations in dozens of interviews, to believe that performance is about big-O notation. But, usually, that doesn’t matter. What matters to deliver a great user experience lies in other dimensions like minimizing I/O operations, tuning indexes in a database, caring about cache locality, or keeping binary sizes under control. There is a real need for mentoring… but these activities are rarely rewarded organizationally.
I would ask that, if you happen to do project planning or headcount allocation, do not treat coding as special. Yes, coding is important, but the cost of writing new code is only a small fraction of delivering a product. Once a product is past a certain size, all other costs like refactoring or servicing become more important, and the costs that were saved by easing coding come to smear everything else. And, please, remember about the impact that these choices have on end user performance.
Let’s end on a positive tone because we do have some nice things.
I’m happy that Go has brought back the idea that trivial deployments and software distribution are beneficial thanks to its push for static binaries. Developers have lost some of their freedom by how opinionated Go is, but everyone else has gained something.
I’m happy that some companies push for homogenization to reduce operational costs at the expense of limiting development choices. See how Google is famous for only allowing certain programming languages in production services, or how Snowflake is adopting Bazel to remove moving parts from the build process. These actions reduce developer choice (a cost to them) but bring savings elsewhere.
And I’m happy that Rust’s memory safety and zero-cost abstractions increase initial development cost at the expense of faster and more reliable apps for end users. Oh, and it simplifies future maintenance costs for developers too! Refactorings are a joy to execute in a Rust code base.
Now pardon me while I go back to work unironically on my framework.