Julio Merino's personal blog
You probably know that software rewrites, while very tempting, are expensive and can be the mistake that kills a project or a company. Yet they are routinely proposed as the solution to all problems. Is there anything you can do to minimize the risk? In this post, I propose that you actively improve the old system to ensure the new system cannot make progress in a haphazard way. This forces the new system to be designed in such a way that delivers breakthrough improvements and not just incremental improvements.
Hello everyone and welcome to this new decade! It’s already 2020 and I’m only 17 days late in writing a first post. I was planning to start with an opinion article, but as its draft is taking longer than I wanted… I’ll present you the story of a recent crazy bug that has kept me busy for the last couple of days. Java crashes with Bazel and sandboxfs On a machine running macOS Catalina, install sandboxfs and build Bazel with sandboxfs enabled, like this:
To conclude the deep dive into Bazel’s dynamic spawn strategy, let’s look at the nightmare that tree artifacts have been with the local lock-free feature. And, yes, I’m double-posting today because I really want to finish these series before the end of the decade1! Tree artifacts are a fancy name for action outputs that are directories, not files. What’s special about them is that Bazel does not know a priori what the directory contents are: the rule behind the action just specifies that there will be a directory with files, and Bazel has to treat that as the unit of output from the action.
In the previous post, we saw how accounting for artifact download times makes the dynamic strategy live to its promise of delivering the best of local and remote build times. Or does it? If you think about it closely, that change made it so that builds that were purely local couldn’t be made worse by enabling the dynamic scheduler: the dynamic strategy would always favor the local branch of a spawn if the remote one took a long time.
In the previous post of this series, we looked at how the now-legacy implementation of the dynamic strategy uses a per-spawn lock to guard accesses to the output tree. This lock is problematic for a variety of reasons and we are going to peek into one of those here. To recap, the remote strategy does the following: Send spawn execution RPC to the remote service. Wait for successful execution (which can come quickly from a cache hit).
When the dynamic scheduler is active, Bazel runs the same spawn (aka command line) remotely and locally at the same time via two separate strategies. These two strategies want to write to the same output files (e.g. object files, archives, or final binaries) on the local disk. In computing, two things trying to affect the same thing require some kind of coördination. You might think, however, that because we assume that both strategies are equivalent and will write the same contents to disk1, this is not problematic.
After introducing Bazel’s dynamic execution a couple of posts ago, it’s time to dive into its actual implementation details as promised. But pardon for the interruption in the last post, as I had to take a little detour to cover a necessary topic (local resources) for today’s article. Simply put, dynamic execution is implemented as “just” one more strategy called dynamic. The dynamic strategy, however, is different from all others because it does not have a corresponding spawn runner.