During my 11 years at Google, I can confidently count the number of times I had to do a “clean build” with one hand: their build system is so robust that incremental builds always work. And when I say always, I really mean always. Phrases like “clean everything and try building from scratch” are unheard of¹. So… you can color me skeptical when someone says—a I’ve recently heard—that incremental build problems are due to bugs in the build files (Makefile, CMakeLists.txt, or what have you) and not due to a suboptimal build system.

And, truth be told, they are right: incremental build failures are indeed caused by bugs in the build files and such bugs abound. The problem is that the vast majority of engineers don’t give a $#!& about the build system. They rightfully want their code to compile, and they’ll do whatever it takes—typically copy/pasting large amounts of code—to coerce the build system into doing what they need. Along this path, they introduce subtle bugs which then lead to strange build problems. Scale that behavior up to tens, hundreds, or thousands of engineers… and any one little bug balloons. Thus the “run a clean build” mantra is born.

The same is true at Google in this regard. While the majority of Google engineers praise their build system, most don’t care about mending it particularly well. These engineers will also copy/paste large amounts of Starlark code just to make things “work”, because this is the reasonable thing for them to do.

And yet… in spite of all the abuse… clean builds are not necessary at Google to keep the machine ~~running~~ building. So, how is this possible? How is Google’s build system resilient to thousands of engineers modifying build files in a gigantic monorepo, most of them without truly understanding what goes under the hood?

The answer lies in the build tool itself: Bazel. Of course Google engineers also make mistakes in their build files. All of us do. But, when those mistakes happen, the build tool refuses to build the code upfront without giving the appearance of success. In other words: the problems that cause incremental builds to fail are real problems and the system surfaces them early on in any build².

To make this possible, the build tool must know, in a fool-proof and perfect manner, when a rule³ (such as a compiler invocation) has to be re-executed. This decision must account for all possible factors that influence the output of the rule.

Sounds simple, right? Indeed it does: this is a very simple concept in theory and most build tools claim to adhere to it. The devil lies in the details, though, and in practice most tools don’t get those details right. But when you do get the tool right, a cultural shift happens. People start trusting that the tool is correct, and when they trust that it is, their expectations and behavior changes. “Do a clean build” is no longer a solution that works, so they get to take a second look at their own build rules, fix them, and learn better practices along the way.

In this post, I want to take a look at common failure modes that are often fixed by running clean builds. For each of them, I will describe how a good build tool addresses them and I’ll refer back to Bazel for my examples because Bazel at Google proves that such a utopian system exists. Rest assured, however, that the concepts and ideas are applicable to any system and possibly in ways that differ from what Bazel does.

Undeclared dependencies

The first and most common problem that breaks incremental builds is caused by undeclared dependencies. The scenario goes like this: you do a first build, then modify a file, and then do a second build. This second build does some “stuff” but the changes you made are not reflected in the final artifacts. What happened?

Simply put: the build system didn’t know that the file you modified was part of the build graph. The file was indeed used in the build by some tool along the process, but the build system was oblivious to this fact because the rule that ran that tool didn’t specify all used files as inputs.

This is a very common and subtle problem. Say you have a rule to build a single .c source file. Because of #include directives, it is pretty obvious that this rule has to specify all included files as inputs. But this rule also has to specify the dependencies of those includes, and the dependencies of those dependencies, and so on. The build rule must account for the full transitive closure of the include files to be accurate. “Ah!”, I hear you say, “Most build systems are aware of this and use the C preprocessor to extract such list, so they are correct!”. Yes, mostly. But… did they account for the fact that the compiler binary itself is also an input to the rule? Most likely they did not. And of course this is only about C where the file inclusion problem is well-understood… but what about the many other languages you might encounter?

The point is: it is very hard to know, on a rule by rule basis, what all the necessary inputs are for its execution. And if the rule misses any of these inputs, the undeclared dependencies problem will be lurking to bite you (much) later. Which, again, is a bug in your build files: you should have stated the inputs to a rule correctly upfront; right?

Right. So why didn’t the build system catch this situation? If the build system had caught the undeclared dependency during the very first build attempt, it would not have put you in an inconsistent state: you would have been forced to fix the build files before the build would actually complete.

A well-behaved build system will ensure that the build rule fails in all cases if it has not specified all necessary inputs as dependencies. By doing this, the build tool prevents you from getting into a state where you have some artifacts that were generated from inputs the tool didn’t know about.

Achieving this goal of detecting undeclared dependencies isn’t trivial if you want the build system to be fast. Consider these options:

You can run each build rule in a fresh container or virtual machine to precisely control the contents of the disk and thus what the rule can do. Unfortunately, setting up and tearing down one of these for each build rule would be prohibitively expensive. Mind you, Bazel has a Docker strategy that does just this, but it’s not useful for interactive usage.
You can relax the container approach and implement a lighter sandboxing approach to control which files the rule is allowed to access. To achieve this, you can rely on system-specific technologies such as Linux’s namespaces or macOS’s sandbox-exec, and then finely tune what each rule is allowed to do. The downsides of this approach are that the more strict you make the sandbox, the slower it becomes, and that this approach is not portable.
You can trace the activity of a rule as it runs to record the files it touches and compare that list to the list of declared inputs after-the-fact. This approach is much faster than sandboxing and I think Microsoft’s BuildXL implements this. The downside is that this requires assistance from the operating system, possibly in the form of a kernel module, which makes it a no-no in many environments.
You can rely on remote execution on an rule basis. If you use remote execution, the build tool will only ship declared inputs to a remote worker in order to run a command, and that command will fail if some of its necessary inputs were not uploaded. This solution is essentially the same as the approach to use fresh virtual machines for every rule, but scales better. And this solution can be combined with the sandboxing techniques described above to ensure that whatever happens on the remote worker doesn’t end up relying on worker-specific files.

In the case of Google, builds are clear of undeclared dependencies because they rely on remote execution by default. Any code that is checked in will be built using remote execution (even if you did not use remote execution on your development machine), so the correctness of your build rules will be enforced at that point. As a result, it is impossible for you to commit code that fails this sanity check.

File modification times

Another problem that breaks incremental builds goes like this: a source file changes but its modification time does not. After that, the build tool doesn’t notice that the file has changed and the file isn’t processed again as part of an incremental build.

You might think that this issue never happens but not all situations in which this problem arises are hypothetical or unlikely. Certainly there are tools that purposely don’t update the modification time, but these are rare. More subtle but common cases involve the file system’s timestamp resolution not being fine enough. For example: HFS+ has 1-second resolution timestamps so it’s perfectly possible to write a file once, do a build, update the file, do another build and have the second build not see the change. This seems very unlikely (who types that fast?) until you automate builds in scripts and/or your build produces and consumes auto-generated source files.

A well-behaved build system knows precisely when an artifact has to be rebuilt because it tracks file contents, not just timestamps. This ensures that the build tool is always aware of when artifacts are stale.

And in fact, this is what Bazel does internally: Bazel tracks the cryptographic digest of each file it knows about so that it can precisely know if an input is different than it was before.

The question is, though: how does the build tool know when to recompute the digests of the files? Doing this computation on each build would be precisely correct but also prohibitively expensive. (Mind you, this is what Bazel did on macOS when I started working on this port and it was not nice.) So we need a way to know when to recompute the digest of a file… and this seems to take us back to scanning for timestamp changes. Not quite.

There are various tricks we can pull off to improve on just using timestamps:

File system aids: if you control the file system on which your sources and outputs live, you can add primitives to the file system to tell you precisely what files have changed between two points in time. Imagine being able to ask the file system: “tell me the list of files that have changed between the moment the previous build ran and now”, and then using that information to only compute those digests. I’m not aware of any public file system that does this, but Bazel has the right hooks in it to implement this functionality. I know of at least one other company other than Google tried to take advantage of them.
Watching for file changes: the build tool can asynchronously monitor for changes to all files it knows about using system-specific primitives such as epoll on Linux or fsevents on macOS. By doing this, the build tool will know which files were modified without having to scan for changes.
Combining modification times with other inode details: when the previous options are not available, the build tool will have to fall back to scanning the file system and looking for changes. And… in this case, we are indeed back to inspecting modification times. But as we have seen, modification times are weak keys, so we should combine them with other details such as inode numbers and file sizes.
Understanding file system timestamp granularity: given what we discussed above, if the tool knows that the file system does not have sufficient granularity to tell changes apart, the tool can work around this. Bazel does have logic in it to compensate in the presence of HFS+, for example.

If the build tool follows all of these tricks, then using content hashes on top might seem to only bring minor benefits. But it does have them as we will see later, and they are not as “minor” as they might seem.

Command line differences

Another problem that often breaks incremental builds is when the build system does not recognize our intentions. Suppose that your project has a DEBUG feature flag that enables expensive debugging features in C++ files that exist throughout the source tree. Now suppose you do a first build with this feature disabled, then notice a bug that you want to debug, add -DDEBUG to the CFLAGS environment variable, and build a second time. This second rebuild does nothing so you are forced to do a clean and start over to convince the build system to pick up the new configuration.

This problem surfaces because the build tool only accounted for file differences in its decisions to rebuild artifacts. In this case, however, no files changed: only the configuration in the environment did and thus the build system didn’t know that it had to do anything different.

A well-behaved build system tracks non-file dependencies and how they change so that it can rebuild affected artifacts. The tool does so by explicitly being aware of the configuration that is involved in the build.

This is a very difficult problem to solve perfectly because what we are saying is that we need a way to track all environmental settings that might possibly affect the behavior of a command. In the example above, we modified the environment. But the build could also have depended on the time of the day, or certain networking characteristics, or the phase of the moon. Accounting for all of these is hard to do in an efficient manner because we are back to the discussion on sandboxing from earlier.

In practice, fortunately, we can approximate a good solution to the problem. This problem primarily arises due to explicit configuration changes triggered by the user, and these configuration changes are done either via files, flags, and environment variables. If we can account for these, then the build tool will behave in a reasonable manner in the vast majority of the cases.

One way to do achieve this goal is to force the configuration to be expressed in files (adding logic to bypass the environment), and then making all build rules depend on the configuration files. This way, when the configuration files’ modification times change, the build system will know that it has to rebuild the world and will do so. This approach indeed works and is implemented by many tools, including GNU Automake. But this approach is extremely inefficient.

Consider what happens when your project contains more than one type of rule in it, say because not all sources are C. And I’m not necessarily talking about a polyglot project: having other kinds of artifacts that are not binaries, such as documentation, are sufficient to trigger this issue. In this case, if all we did was change the value of the CFLAGS setting, we would only expect the C rules to be rebuilt. After that, we would expect the consumers of those rules to be rebuilt as well, and so on. In other words: we would only want to rebuild the dependency paths that take us from leaf rules to the rules that might possibly yield different results based on the configuration that changed.

A better (and simpler!) solution to this problem is to forget about files and to track the environmental details that affect a rule at the rule level. In this model, we extend the concept of inputs to a rule from just files to files-plus-some-metadata. The way this looks like in practice, at least in Bazel, is by making the command lines an input to the rule and by “cleaning up” the environment variables so that they are not prone to interference from user changes.

In the example we showed above, adding -DDEBUG to the C configuration would cause a rule of the form cc -o foo.o -c foo.c to become cc -DDEBUG -o foo.o -c foo.c. These are clearly different even to the untrained eye and can yield different outputs. By tracking the command line at the rule level, the build tool can know which specific rules have to be rebuilt, and will only rebuild those that were affected by our configuration change.

Output tree inconsistencies

The last problem that sometimes breaks incremental builds appears when we end up with mismatched artifacts in the output tree. As in the previous section, suppose your project has a DEBUG feature flag that enables expensive debugging features. Now suppose again that you do a full build with this feature disabled. But this time, you then go to a specific subdirectory of the project, touch a bunch of files, and rebuild that subdirectory alone with -DDEBUG because you want to troubleshoot your changes.

Now what happens? All of the outputs in the output tree were built with DEBUG disabled except for the tiny subset that was rebuilt with this flag enabled. The output tree is now inconsistent and the build tool has no way of knowing that this has happened. From this point on, things might work well, or they might not. In the case of something as DEBUG-type inconsistencies, you might observe weird performance issues at runtime, but in the case of flags that change the ABIs of the intermediate artifacts, you might observe build failures. At that point, a clean build is the only way out.

A well-behaved build system avoids inconsistent output trees by tracking the configuration that was used to build each artifact as part of the artifact itself, and groups such artifacts in a consistent manner so that they are never intermixed.

This is a very hard problem to address if you want the tool to remain usable. In the limit, you would hash the configuration and make that value part of the artifact path. Unfortunately, doing so would cause the amount of separate files in the output tree to explode, would cause disk usage to explode too, and would bring confusion as the paths in the output tree would be numerous and nonsensical.

The approach that most tools take—assuming they are aware of this problem—is to compromise. Most account only for major configuration differences in the way the output tree is laid out. Bazel and Cargo, for example, will separate release and debug builds into parallel output hierarchies. Bazel will go one step further and also account for CPU targets in this scheme. The result is a relatively usable output tree, but it is not perfectly correct because it’s still possible to end up with intermixed outputs. As far as I can tell, this is an open research area.

Collateral benefits

Wow that was long, but that’s about it regarding the kinds of problems that break incremental builds and various techniques to address them. Before proceeding to look at other benefits that we get from following these better practices, let’s review what we have seen so far:

All input files to a rule must be represented in the build graph. These have to be specified either directly in the build files or indirectly via some form of dynamic discovery or introspection.
Changes to input files have to be detected in a precise manner: modification times are insufficient. In the best case, content hashes provide correctness, but if they are unsuitable for performance reasons, other file properties such as inode numbers and file sizes should be accounted for.
All environmental details that affect a rule, and especially the command line of the rule and the environment variables passed to it, must be represented in the build graph as inputs to that rule. If these inputs change, the rule has to be rebuilt.
Artifacts have to be stored accounting for the configuration that was used to build them to prevent mixing incompatible artifacts. A common way to do this is to shard the output tree into parallel trees named after specific configuration settings (debug vs. release, target platform, etc.).

Few build systems implement all of these techniques. But once a build system has these techniques, magic happens:

Clean builds become a thing of the past, which was the whole premise of this post.
“It works on my machine” also becomes a thing of the past. Different behaviors across different machines most often come from factors that were not accounted for during the build, thus yielding different artifacts. If the build can account for all those factors, and you make sure that they are the same across machines (which you’d want to do if you were sharing caches, for example), then the builds will be the same no matter where they happen.
Caching works for free across machines and even across users. If we can express everything that affects the behavior of a rule as a cache key, then we can cache the output of the rule using that key.
Based on what we said until now, this cache key must account, at the very minimum, for the digests of all input files to the rule, the command line used in the rule, and the environment settings that might affect the rule. If you can tightly control the environment (such as by cleaning up environment variables or using sandboxing to limit network access), the better, because your cache key has to account for fewer details and will be reusable in more contexts.
Optimal builds follow. We didn’t touch upon this earlier, but a benefit that is immediately derived from tracking file contents instead of timestamps is that builds become optimally efficient.
Suppose you have a utils.c file at the base of your dependency tree. In a common build system, if you touch this file to fix a trivial typo in a comment, the system will invalidate the whole dependency chain: utils.c will be rebuilt as utils.o, then utils.o will get a newer timestamp which in turn will trigger the rebuild of all of its consumers, and so on until we reach the leaves of the dependency tree. This needn’t happen. If we track file contents instead of the modification time, and if the modification of utils.c causes the new utils.o to match the previous file bit-by-bit, then no other rule downstream from that will have to be rebuilt—even if utils.o’s timestamp changes.

There is a lot of good that comes from embracing a good build system. Bazel checks most of the boxes I have outlined until now, but “migrating to Bazel” isn’t a realistic proposition for many developers. Under those conditions, being aware of the causes behind broken incremental builds is important, because then you can apply tactical fixes to work around the deficiencies of the underlying tool.

I’m fully aware that this post has packed a ton of content, some in a haphazard way because I didn’t have the time to make it shorter. I probably also missed some key root cause behind broken incremental builds. In case of any doubt, please let me know via any of the contact links below.

And with that, let’s say good riddance to this 2020. Here is to a better 2021! 🥳

There have been times when incremental builds did actually break, but those were due to bugs in the build system itself—which are unusual. And when those kinds of bugs happen, they are considered outages and are fixed by the infrastructure teams for everyone, not by telling people to “run a clean build”. ↩︎
I can’t resist but compare what I just said here to the differences between C and Rust. Memory management problems are a fact, and no matter what we want to believe, people will continue making them if the language allows them to. The resemblance in this context is that most programs will happily run even if they have memory-related bugs—until they don’t. In the presence of such bugs, C has given us the appearance that the code was good by allowing it to compile and run, postponing the discovery of the bugs until much later. In contrast, Rust forbids us from ever getting to that stage by blocking seemingly good but unsound code from ever compiling. ↩︎
For simplicity, this post talks about build rules and build actions interchangeably even though they are not strictly the same. Whenever you see “rule”, assume I’m talking about a command that the build system executes to transform one or more source files into one or more output artifacts. ↩︎