Julio Merino's personal blog
After introducing Bazel’s dynamic execution a couple of posts ago, it’s time to dive into its actual implementation details as promised. But pardon for the interruption in the last post, as I had to take a little detour to cover a necessary topic (local resources) for today’s article. Simply put, dynamic execution is implemented as “just” one more strategy called dynamic. The dynamic strategy, however, is different from all others because it does not have a corresponding spawn runner.
How does Bazel avoid melting your workstation with concurrent subprocesses? Or… tries to, because I know it still does that sometimes? There are two mechanisms as play: the jobs number and the local resources tracker. Let’s dive into them. The jobs number, given by the --jobs flag, configures the number of concurrent Skyframe evaluators during the execution phase1. What a mouthful. What this essentially means is that jobs indicates the number of threads used to walk the graph looking for actions to execute—and also executing them.
Bazel’s dynamic execution is a feature that makes your builds faster by using remote and local resources, transparently and at the same time. We launched this feature in Bazel 0.21 back in February 2019 along an introductory blog post and have been hard at work since then to improve it. The reason dynamic execution makes builds faster is two-fold: first, because we can hide hiccups in the connectivity to the remote build service; and, second, because we can take advantage of things like persistent workers, which are designed to offer super-fast edit/build/test cycles.
“Strategies? Will you talk about Bazel’s strategy for world domination 🙀?” No… not exactly that. Dynamic execution has been quite a hot topic in my work over the last few months and I am getting ready to publish a series of posts on it soon. But before I do that, I need to first review Bazel’s execution strategies because they play a big role in understanding what dynamic execution is and how it’s implemented.
In the previous posts, we saw why waiting for a process group is complicated and we covered a specific, bullet-proof mechanism to accomplish this on Linux. Now is the time to investigate this same topic on macOS. Remember that the problem we are trying to solve (#10245) is the following: given a process group, wait for all of its processes to fully terminate. macOS has a bunch of fancy features that other systems do not have, but process control is not among them.
In the previous post, we saw why waiting for a process group to terminate is important (at least in the context of Bazel), and we also saw why this is a difficult thing to do in a portable manner. So today, let’s dive into how to do this properly on a Linux system. On Linux, we have two routes: using the child subreaper feature or using PID namespaces. We’ll focus on the former because that’s what we’ll use to fix (#10245) the process wrapper1, and because they are sufficient to fully address our problem.