Julio Merino's personal blog
Bazel’s dynamic execution is a feature that makes your builds faster by using remote and local resources, transparently and at the same time. We launched this feature in Bazel 0.21 back in February 2019 along an introductory blog post and have been hard at work since then to improve it. The reason dynamic execution makes builds faster is two-fold: first, because we can hide hiccups in the connectivity to the remote build service; and, second, because we can take advantage of things like persistent workers, which are designed to offer super-fast edit/build/test cycles.
“Strategies? Will you talk about Bazel’s strategy for world domination 🙀?” No… not exactly that. Dynamic execution has been quite a hot topic in my work over the last few months and I am getting ready to publish a series of posts on it soon. But before I do that, I need to first review Bazel’s execution strategies because they play a big role in understanding what dynamic execution is and how it’s implemented.
In the previous posts, we saw why waiting for a process group is complicated and we covered a specific, bullet-proof mechanism to accomplish this on Linux. Now is the time to investigate this same topic on macOS. Remember that the problem we are trying to solve (#10245) is the following: given a process group, wait for all of its processes to fully terminate. macOS has a bunch of fancy features that other systems do not have, but process control is not among them.
In the previous post, we saw why waiting for a process group to terminate is important (at least in the context of Bazel), and we also saw why this is a difficult thing to do in a portable manner. So today, let’s dive into how to do this properly on a Linux system. On Linux, we have two routes: using the child subreaper feature or using PID namespaces. We’ll focus on the former because that’s what we’ll use to fix (#10245) the process wrapper1, and because they are sufficient to fully address our problem.
Process groups are a feature of Unix systems to group related processes under a common identifier, known as the PGID. Using the PGID, one can look for these related process and send signals in unison to them. This is typically used by shell interpreters to manage processes. For example, let’s launch a shell command that puts two sleep invocations in the background (those with the 10- and 20-second delays) and then sleeps the direct child (with a 5-second delay)—while also putting the whole invocation in the background so that we can inspect what’s going on:
As strange as it may sound, a very important job of any build tool is to orchestrate the execution of lots of other programs—and Bazel is no exception. Once Bazel has finished loading and analyzing the build graph, Bazel enters the execution phase. In this phase, the primary thing that Bazel does is walk the graph looking for actions to execute. Then, for each action, Bazel invokes its commands—things like compiler and linker invocations—as subprocesses.