Showing 6 posts
In software engineering organizations, there are certain practices that keep costs under control even if those seem more expensive at first. Unfortunately, because such practices feel more expensive, teams choose to keep their status quo even when they know it is suboptimal. This choice ends up hurting productivity and morale because planned work is continuously interrupted, which in turn drags project completion. The reason I say seem and not are is because the alternatives to these cost-exposing practices also suffer from costs. The difference is that, while the former surface costs, leading to the need to allocate time and people to infrastructure work, the latter keeps the costs smeared over teams and individuals in ways that are difficult to account and plan for. To illustrate what I’m trying to say, I’ll present three different scenarios in which this opinion applies. All of these case studies come from past personal experiences while working in different teams and projects. The first one covered in this post is about the adoption of a monorepo vs. the use of multiple different repositories. The other two will come in follow-up articles.
Companies grow, and with them do the software projects that support them. It should be no surprise that larger programs require longer build times. And, if I had to guess, you have seen how those build times eventually grow to unbearable levels, reducing productivity and degrading quality. In this post, I examine how we can leverage the common techniques we use for production services—namely SLIs and SLOs—to keep build times on track.
Monorepos are an interesting beast. If mended properly, they enable a level of uniformity and code quality that is hard to achieve otherwise. If left unattended, however, they become unmanageable monsters of tangled dependencies, slow builds, and frustrating developer experiences. Whether you have a good or bad experience directly depends on the level of engineering support behind the monorepo. Simply put, monorepos require dedicated teams and tools to run nicely. In this post, I will look at how almost-perfect caching plays a key role in keeping build times manageable under such an environment.
During my 11 years at Google, I can confidently count the number of times I had to do a “clean build” with one hand: their build system is so robust that incremental builds always work. Phrases like “clean everything and try building from scratch” are unheard of. So… you can color me skeptical when someone says that incremental build problems are due to bugs in the build files and not due to a suboptimal build system. The answer lies in having a robust build system, and in this post I’ll examine the common causes behind incremental build breakages, what the build system can do to avoid them, and how Bazel accomplishes most of them.
Bazel’s original raison d’etre was to support Google’s monorepo. A consequence of using a monorepo is that some builds will become very large. And large builds can be very resource hungry, especially when using a tool like Bazel that tries to parallelize as many actions as possible for efficiency reasons. There are many resource types in a system, but today I’d like to focus on the number of open files at any given time (nofiles).
Blaze—the variant of Bazel used internally at Google—was originally designed to build the Google monorepo. One of the beauties of sticking to a monorepo is code reuse, but this has the unfortunate side-effect of dependency bloat. As a result, Bazel and Blaze have evolved to support ever-increasingly-bigger pieces of software. The growth of the projects built by Bazel and Blaze has had the unsurprising consequence that our engineers all now have high-end workstations with access to massive amounts of distributed resources.