BazelCon 2024 recap

Just like that, BazelCon 2024 came and went. So… it’s obviously time to summarize the two events of last week: BazelCon 2024 and the adjacent Build Meetup. There is A LOT to cover, but everything is here in just one article!

October 22, 2024 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/blogsystem5">blogsystem5</a>, <a href="/tags/snowflake">snowflake</a>
Continue reading (about 42 minutes)

Bazel interview at Software Engineering Daily

Just a bit over 2 months ago, on October 5th, 2023, Jordi Mon Companys interviewed me about Bazel for an episode in the Software Engineering Daily podcast. The episode finally came out on December 18th, 2023, so here is your announcement to stop by and listen to it! Cover image (and link) to the Bazel interview in Software Engineering Daily. If you don’t have time to listen to the whole 45 minutes, or if you want to get a sense of what you will get out of it, here is a recap of everything we touched on. Every paragraph is annotated with the rough time where the discussion starts so that you can jump right in to whatever interests you the most.

December 21, 2023 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/blogsystem5">blogsystem5</a>
Continue reading (about 5 minutes)

Strings, encodings, NULs and Bazel

Just yesterday, Twitter user @vkrajacic wrote: Advice for new C programmers: “Avoid null-terminated strings; they’re outdated, inefficient and impractical.” Create your own type with basic functions. It’s not that hard, and it goes a long way. One of the benefits of this approach, among others, is slicing without copying. This suggestion has its merits and I understand where it is coming from: performance. You see: the traditional way to represent strings in C is to use NUL-terminated byte arrays. Yet… this has deemed to be the most expensive one-byte mistake because of the adverse performance implications that this carries. (NUL, not NULL, is the better name for the \0 byte by the way.)

December 3, 2023 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/blogsystem5">blogsystem5</a>, <a href="/tags/java">java</a>
Continue reading (about 4 minutes)

End-to-end tool testing with Bazel and shtk

If you use Bazel, your project is of moderate size. And because your project is of moderate size, it almost-certainly builds one or more binaries, at least one of which is a CLI tool. But let’s face it: you don’t have end-to-end testing for those tools, do you? I’m sure you have split the binary’s main function into its own file so that the rest of the tool can be put in a library, and I’m extra-sure that you have unit tests for such library. But… those tests do little to verify the functionality and quality of the tool as experienced by the end user. Consider: What exactly does the tool print to the console on success? Does it show errors nicely when they happen, or does it dump internal stack traces? How does it handle unknown flags or bad arguments? Is the built-in help message nicely rendered when your terminal is really wide? What if the terminal is narrow? You must write end-to-end tests for your tools but, usually, that isn’t easy to do. Until today. Combining shtk with Bazel via the new rules_shtk ruleset makes it trivial to write tests that verify the behavior of your CLI tools—no matter what language they are written in—and in this article I’m going to show you how.

November 4, 2023 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/blogsystem5">blogsystem5</a>, <a href="/tags/shell">shell</a>, <a href="/tags/shtk">shtk</a>, <a href="/tags/testing">testing</a>
Continue reading (about 7 minutes)

BazelCon 2023 et al. trip report

I’m exhausted. I just came back to Seattle from a 10-day trip in which I attended three different Bazel events: the Build Meetup in Reykjavik, the Bazel Community Day in Munich, and BazelCon 2023 in Munich too. Oh, and because I was on the other side of the world, I also paid a visit to my family in Spain. Attending these events has been incredibly useful and productive: I got exposure to many ideas and discussions that would just not happen online, I got to build connections with very interesting people and, of course, it has also been super fun too to reconnect with old coworkers and friends. This article contains the summary of the things I learned and the things I want to follow up on. These are just a bunch of cleaned-up notes which I took and are in the context of my work with Bazel at Snowflake and my interests on build tools, so this is not endorsed by Snowflake.

October 30, 2023 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/blogsystem5">blogsystem5</a>, <a href="/tags/snowflake">snowflake</a>
Continue reading (about 15 minutes)

Build farm visualizations

If you have followed our recent infrastructure posts, you know by now that we are actively migrating Snowflake’s build to Bazel. What we haven’t shared yet is that we have deployed our own Build Barn cluster to support Bazel’s remote execution features. We have chosen to run our own build farm service for resource governance and security purposes, but also because the behavior of this system impacts the developer experience so directly that we want to have full in-house control and knowledge of it.

October 20, 2023 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/snowflake">snowflake</a>
Continue reading (about 10 minutes)

Analyzing OOMs in IntelliJ with Bazel

A few months ago, we described how we fixed three different OOM scenarios in our ongoing migration to the Bazel build system here at Snowflake. Things had been sailing along just fine since then… but a new issue showed up recently: our IntelliJ with Bazel (IjwB) Java project started showing OOMs during its sync phase. The reason this issue surfaced now is because, as we continue our migration to Bazel, our IjwB project has grown in size. Months ago, our project only covered a Java binary, but now that we have migrated all of its unit and integration tests as well, the project covers them too. It is common for tests to be more expensive to build and run than the binary they validate—tests depend on the binary’s dependencies plus many other helper tools for testing—and these caused the project to grow too big to fit in our development environments. Or did they?

October 7, 2023 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/snowflake">snowflake</a>
Continue reading (about 9 minutes)

Addressing Bazel OOMs

Here at Snowflake, the Developer Productivity organization (DPE for short) is tackling some important problems we face as a company: namely, lengthening build times and complex development environments. A key strategy we are pursuing to resolve these is the migration of key build processes from CMake and Maven to Bazel. We are still in the early stages of this migration and cannot yet share many details or a success story, but we can start explaining some of the issues we encounter as we work through this ambitious project.

March 16, 2023 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/snowflake">snowflake</a>
Continue reading (about 16 minutes)

Defining build time SLIs and SLOs

Companies grow, and with them do the software projects that support them. It should be no surprise that larger programs require longer build times. And, if I had to guess, you have seen how those build times eventually grow to unbearable levels, reducing productivity and degrading quality. In this post, I examine how we can leverage the common techniques we use for production services—namely SLIs and SLOs—to keep build times on track.

March 12, 2021 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/development">development</a>, <a href="/tags/monorepo">monorepo</a>
Continue reading (about 16 minutes)

How does Google keep build times low?

Monorepos are an interesting beast. If mended properly, they enable a level of uniformity and code quality that is hard to achieve otherwise. If left unattended, however, they become unmanageable monsters of tangled dependencies, slow builds, and frustrating developer experiences. Whether you have a good or bad experience directly depends on the level of engineering support behind the monorepo. Simply put, monorepos require dedicated teams and tools to run nicely. In this post, I will look at how almost-perfect caching plays a key role in keeping build times manageable under such an environment.

February 26, 2021 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/featured">featured</a>, <a href="/tags/monorepo">monorepo</a>, <a href="/tags/opinion">opinion</a>
Continue reading (about 11 minutes)

How does Google avoid clean builds?

During my 11 years at Google, I can confidently count the number of times I had to do a “clean build” with one hand: their build system is so robust that incremental builds always work. Phrases like “clean everything and try building from scratch” are unheard of. So… you can color me skeptical when someone says that incremental build problems are due to bugs in the build files and not due to a suboptimal build system. The answer lies in having a robust build system, and in this post I’ll examine the common causes behind incremental build breakages, what the build system can do to avoid them, and how Bazel accomplishes most of them.

December 31, 2020 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/featured">featured</a>, <a href="/tags/google">google</a>, <a href="/tags/monorepo">monorepo</a>, <a href="/tags/opinion">opinion</a>
Continue reading (about 20 minutes)

The final boss: Bazel's own JNI code

As you might have read elsewhere, I’m leaving the Bazel team and Google in about a week. My plan for these last few weeks was to hand things off as cleanly as possible… but I was also nerd-sniped by a bug that came my way a fortnight ago. Fixing it has been my self-inflicted punishment for leaving, and oh my, it has been painful. Very painful. Let me tell you the story of this final boss.

October 9, 2020 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/bug">bug</a>
Continue reading (about 13 minutes)

Bazel output streaming, Ctrl+C, and test flakiness

About two weeks ago, I found a very interesting bug in Bazel’s test output streaming functionality while writing tests for a new feature related to Ctrl+C interrupts. I fixed the bug, wrote a test for it, and… the test itself came back as flaky, which made me find another very subtle bug in the test that needed a one-line fix. This is the story of both. Bazel has a feature known as test output streaming: by default, Bazel captures the outputs (stdout and stderr) of the tests it runs, saves those in local log files, and tells the user where they are when a test fails. This is not very ergonomic when you are iterating on a test, so you can make Bazel print the output of the test as it runs by passing --test_output=streamed to the invocation.

September 18, 2020 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/bug">bug</a>
Continue reading (about 9 minutes)

Bazel UI locking and file downloads

About a month ago, I was benchmarking the impact of a new Bazel feature and I noticed that a test build that should have taken only a few seconds took almost 10 minutes. My Internet connection was flaking out indeed, but something else didn’t seem right. So I looked and found that Bazel was doing network calls within a critical section, and these were the root cause behind the massive slowdown. But how did we get such an obvious no-no into the codebase? Read on to see how this happened and how gnarly it was to fix!

September 1, 2020 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/bug">bug</a>
Continue reading (about 12 minutes)

Shipping Bazel's new dynamic scheduler

Back in September 2019, I embarked into the task of rewriting Bazel’s dynamic scheduler to deal with slow and flaky networks. Initial testing had shown that dynamic builds might become slower, and it was all due to this feature having been designed for a different use case (in-office, high-speed network). We had to fix two different issues in the scheduler. The first fix was making the downloads of the remote artifacts happen without holding the output lock. In this way, a local action would be allowed to execute while the remote action was still fetching outputs (possibly never terminating if the connection was flaky). This took several attempts to get to a stable fix, but I could eventually ship this by November which in turn unblocked us to roll out the real feature we wanted to deliver to our iOS user base.

June 12, 2020 · Tags: <a href="/tags/bazel">bazel</a>
Continue reading (about 10 minutes)

Running codesign over SSH with a new key

I just spent sometime between 30 minutes and 1 hour convincing the Mac Pro that sits in my office to successfully codesign an iOS app via Bazel. This was after having to update the signing key to a newer one and after rebooting the machine due to the macOS 10.15.5 upgrade—all remotely thanks to COVID-19. The build of the app was failing with an errSecInternalComponent error printed by codesign. It is not the first time I face this, but in all previous cases, I had either been at the computer to click through security popups, had had functional Chrome Remote Desktop access, or did not have to install a new signing key remotely.

May 29, 2020 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/macos">macos</a>
Continue reading (about 3 minutes)

Ensuring system rewrites are truly necessary

You probably know that software rewrites, while very tempting, are expensive and can be the mistake that kills a project or a company. Yet they are routinely proposed as the solution to all problems. Is there anything you can do to minimize the risk? In this post, I propose that you actively improve the old system to ensure the new system cannot make progress in a haphazard way. This forces the new system to be designed in such a way that delivers breakthrough improvements and not just incremental improvements.

January 24, 2020 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/essay">essay</a>, <a href="/tags/featured">featured</a>, <a href="/tags/sre">sre</a>
Continue reading (about 7 minutes)

The OSXFUSE, hard links, and dladdr puzzle

Hello everyone and welcome to this new decade! It’s already 2020 and I’m only 17 days late in writing a first post. I was planning to start with an opinion article, but as its draft is taking longer than I wanted… I’ll present you the story of a recent crazy bug that has kept me busy for the last couple of days. Java crashes with Bazel and sandboxfs On a machine running macOS Catalina, install sandboxfs and build Bazel with sandboxfs enabled, like this:

January 17, 2020 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/fuse">fuse</a>, <a href="/tags/internals">internals</a>, <a href="/tags/programming">programming</a>, <a href="/tags/sandboxfs">sandboxfs</a>
Continue reading (about 13 minutes)

Tree artifacts and transient files

To conclude the deep dive into Bazel’s dynamic spawn strategy, let’s look at the nightmare that tree artifacts have been with the local lock-free feature. And, yes, I’m double-posting today because I really want to finish these series before the end of the decade1! Tree artifacts are a fancy name for action outputs that are directories, not files. What’s special about them is that Bazel does not know a priori what the directory contents are: the rule behind the action just specifies that there will be a directory with files, and Bazel has to treat that as the unit of output from the action. Other than that, tree artifacts are “just” a different kind of output2.

December 31, 2019 · Tags: <a href="/tags/bazel">bazel</a>
Continue reading (about 4 minutes)

Lifting the local lock for dynamic execution

In the previous post, we saw how accounting for artifact download times makes the dynamic strategy live to its promise of delivering the best of local and remote build times. Or does it? If you think about it closely, that change made it so that builds that were purely local couldn’t be made worse by enabling the dynamic scheduler: the dynamic strategy would always favor the local branch of a spawn if the remote one took a long time. But for builds that were better off when they were fully remote (think of a fully-cached build with great networking), this is not true: the dynamic strategy might hurt them because because we may discard some of those remote cache hits.

December 31, 2019 · Tags: <a href="/tags/bazel">bazel</a>
Continue reading (about 4 minutes)

Artifact downloads and dynamic execution

In the previous post of this series, we looked at how the now-legacy implementation of the dynamic strategy uses a per-spawn lock to guard accesses to the output tree. This lock is problematic for a variety of reasons and we are going to peek into one of those here. To recap, the remote strategy does the following: Send spawn execution RPC to the remote service. Wait for successful execution (which can come quickly from a cache hit). Lock the output tree (only when run within the dynamic strategy). Download the spawn’s outputs directly into the output tree. Note how we lock the output tree before we have downloaded any outputs, and taking the lock means that the local branch of the same spawn cannot start or complete even if there are plenty of local resources available to run it.

December 30, 2019 · Tags: <a href="/tags/bazel">bazel</a>
Continue reading (about 5 minutes)

Output conflicts and dynamic execution

When the dynamic scheduler is active, Bazel runs the same spawn (aka command line) remotely and locally at the same time via two separate strategies. These two strategies want to write to the same output files (e.g. object files, archives, or final binaries) on the local disk. In computing, two things trying to affect the same thing require some kind of coördination. You might think, however, that because we assume that both strategies are equivalent and will write the same contents to disk1, this is not problematic. But, in fact, it can be, because file creations/writes are not atomic. So we need some form of mutual exclusion in place to avoid races.

December 27, 2019 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/sandboxfs">sandboxfs</a>
Continue reading (about 4 minutes)

Bazel's dynamic strategy

After introducing Bazel’s dynamic execution a couple of posts ago, it’s time to dive into its actual implementation details as promised. But pardon for the interruption in the last post, as I had to take a little detour to cover a necessary topic (local resources) for today’s article. Simply put, dynamic execution is implemented as “just” one more strategy called dynamic. The dynamic strategy, however, is different from all others because it does not have a corresponding spawn runner. Instead, the dynamic strategy wraps two different strategies: one for local execution and one for remote execution.

December 26, 2019 · Tags: <a href="/tags/bazel">bazel</a>
Continue reading (about 3 minutes)

How does Bazel track local resource usage?

How does Bazel avoid melting your workstation with concurrent subprocesses? Or… tries to, because I know it still does that sometimes? There are two mechanisms as play: the jobs number and the local resources tracker. Let’s dive into them. The jobs number, given by the --jobs flag, configures the number of concurrent Skyframe evaluators during the execution phase1. What a mouthful. What this essentially means is that jobs indicates the number of threads used to walk the graph looking for actions to execute—and also executing them. So if we have N threads, each of which processes one node of the graph at a time, and we know that the most nodes trigger process executions, we have at most N concurrent spawns.

December 23, 2019 · Tags: <a href="/tags/bazel">bazel</a>
Continue reading (about 5 minutes)

Introduction to Bazel's dynamic execution

Bazel’s dynamic execution is a feature that makes your builds faster by using remote and local resources, transparently and at the same time. We launched this feature in Bazel 0.21 back in February 2019 along an introductory blog post and have been hard at work since then to improve it. The reason dynamic execution makes builds faster is two-fold: first, because we can hide hiccups in the connectivity to the remote build service; and, second, because we can take advantage of things like persistent workers, which are designed to offer super-fast edit/build/test cycles. Put in numbers, here is what dynamic execution looks like for a relatively large iOS build I measured at Google a few months ago:

December 20, 2019 · Tags: <a href="/tags/bazel">bazel</a>
Continue reading (about 3 minutes)

What are Bazel's strategies?

“Strategies? Will you talk about Bazel’s strategy for world domination 🙀?” No… not exactly that. Dynamic execution has been quite a hot topic in my work over the last few months and I am getting ready to publish a series of posts on it soon. But before I do that, I need to first review Bazel’s execution strategies because they play a big role in understanding what dynamic execution is and how it’s implemented.

December 14, 2019 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/sandboxfs">sandboxfs</a>
Continue reading (about 6 minutes)

Waiting for process groups, macOS edition

In the previous posts, we saw why waiting for a process group is complicated and we covered a specific, bullet-proof mechanism to accomplish this on Linux. Now is the time to investigate this same topic on macOS. Remember that the problem we are trying to solve (#10245) is the following: given a process group, wait for all of its processes to fully terminate. macOS has a bunch of fancy features that other systems do not have, but process control is not among them. We do not have features like Linux’s child subreaper or PID namespaces to keep track of process groups. Therefore, we’ll have to roll our own. And the only way to do this is to scan the process table looking for processes with the desired process group identifier (PGID) and waiting until they are gone.

November 15, 2019 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/darwin">darwin</a>, <a href="/tags/macos">macos</a>, <a href="/tags/unix">unix</a>
Continue reading (about 8 minutes)

Waiting for process groups, Linux edition

In the previous post, we saw why waiting for a process group to terminate is important (at least in the context of Bazel), and we also saw why this is a difficult thing to do in a portable manner. So today, let’s dive into how to do this properly on a Linux system. On Linux, we have two routes: using the child subreaper feature or using PID namespaces. We’ll focus on the former because that’s what we’ll use to fix (#10245) the process wrapper1, and because they are sufficient to fully address our problem.

November 14, 2019 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/linux">linux</a>, <a href="/tags/unix">unix</a>
Continue reading (about 4 minutes)

Waiting for process groups, introduction

Process groups are a feature of Unix systems to group related processes under a common identifier, known as the PGID. Using the PGID, one can look for these related process and send signals in unison to them. This is typically used by shell interpreters to manage processes. For example, let’s launch a shell command that puts two sleep invocations in the background (those with the 10- and 20-second delays) and then sleeps the direct child (with a 5-second delay)—while also putting the whole invocation in the background so that we can inspect what’s going on:

November 12, 2019 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/unix">unix</a>
Continue reading (about 6 minutes)

Bazel's process-wrapper helper tool

As strange as it may sound, a very important job of any build tool is to orchestrate the execution of lots of other programs—and Bazel is no exception. Once Bazel has finished loading and analyzing the build graph, Bazel enters the execution phase. In this phase, the primary thing that Bazel does is walk the graph looking for actions to execute. Then, for each action, Bazel invokes its commands—things like compiler and linker invocations—as subprocesses. Under the default configuration1, all of these commands run on your machine.

November 8, 2019 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/internals">internals</a>, <a href="/tags/unix">unix</a>
Continue reading (about 6 minutes)

A quick glance at macOS' sandbox-exec

macOS includes a sandboxing mechanism to closely control what processes can do on the system. Sandboxing can restrict file system accesses on a path level, control which host/port pairs can be reached over the network, limit which binaries can be executed, and much more. All applications installed via the App Store are subject to sandboxing. This sandboxing functionality is exposed via the sandbox-exec(1) command-line utility, which unfortunately has been listed as deprecated for at least the last two major versions of macOS. It is still there, however, and the supplemental manual pages like sandbox(7) or sandboxd(8) do not mention the deprecation… which makes me think that the new App Sandboxing feature is built on the same kernel subsystem as sandbox-exec(1).

November 1, 2019 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/macos">macos</a>
Continue reading (about 3 minutes)

Optimizing tree deletions in Bazel

Bazel likes creating very deep and large trees on disk during a build. One example is the output tree, which naturally contains all the artifacts of your build. Another, more problematic example is the symlink forest trees created for every action when sandboxing is enabled. As garbage gets created, it must be deleted. It turns out, however, that deleting file system trees can be very expensive—and especially so on macOS. In fact, calls to our deleteTree algorithm routinely showed up in my profiling runs when trying to diagnose slowdowns using the dynamic scheduler. One thing I quickly wondered is: why can I easily catch Bazel stuck in the tree deletion but I can never catch it busily creating such a tree? Is tree deletion inherently slow or are we doing something stupid?

March 22, 2019 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/macos">macos</a>, <a href="/tags/performance">performance</a>
Continue reading (about 4 minutes)

Darwin's QoS service classes and performance

Since the publication of Bazel a few years ago, users have reported (and I myself have experienced) general slowdowns when Bazel is running on Macs: things like the window manager stutter and others like the web browser cannot load new pages. Similarly, after the introduction of the dynamic spawn scheduler, some users reported slower builds than pure remote or pure local builds, which made no sense. All along we guessed that these problems were caused by Bazel’s abuse of system threads, as it used to spawn 200 runnable threads during analysis and used to run 200 concurrent compiler subprocesses. We tackled the problem by reducing Bazel’s abuse (e.g. commit ac88041) of system resources… and while we saw an improvement, the issue remained.

March 6, 2019 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/featured">featured</a>, <a href="/tags/macos">macos</a>, <a href="/tags/performance">performance</a>
Continue reading (about 6 minutes)

Using setenv equals setting global variables

This is the tale of yet another Bazel bug, this time involving environment variables, global state, and gRPC. Through it, I’ll argue that you should never use setenv within a program unless you are doing so to execute something else.

February 22, 2019 · Tags: <a href="/tags/bazel">bazel</a>
Continue reading (about 4 minutes)

Encode your assumptions

The point of this post is simple and I’ll spoil it from the get go: every time you make an assumption in a piece of code, make such assumption explicit in the form of an assertion or error check. If you cannot do that (are you sure?), then write a detailed comment. In fact, I’m exceedingly convinced that the amount of assertion-like checks in a piece of code is a good indicator of the programmer’s expertise.

February 7, 2019 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/production-software">production-software</a>, <a href="/tags/readability">readability</a>, <a href="/tags/software">software</a>
Continue reading (about 4 minutes)

Hello, sandboxfs 0.1.0

I am pleased to announce that the first release of sandboxfs, 0.1.0, is finally here! You can download the sources and prebuilt binaries from the 0.1.0 release page and you can read the installation instructions for more details. The journey to this first release has been a long one. sandboxfs was first conceived over two years ago, was first announced in August 2017, showed its first promising results in April 2018, and has been undergoing a rewrite from Go to Rust. (And by the way, this has been my 20% project at Google so rest assured that they are still possible!)

February 5, 2019 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/featured">featured</a>, <a href="/tags/pkg_comp">pkg_comp</a>, <a href="/tags/sandboxctl">sandboxctl</a>, <a href="/tags/sandboxfs">sandboxfs</a>, <a href="/tags/software">software</a>
Continue reading (about 7 minutes)

Open files limit, macOS, and the JVM

Bazel’s original raison d’etre was to support Google’s monorepo. A consequence of using a monorepo is that some builds will become very large. And large builds can be very resource hungry, especially when using a tool like Bazel that tries to parallelize as many actions as possible for efficiency reasons. There are many resource types in a system, but today I’d like to focus on the number of open files at any given time (nofiles).

January 29, 2019 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/jvm">jvm</a>, <a href="/tags/macos">macos</a>, <a href="/tags/monorepo">monorepo</a>, <a href="/tags/portability">portability</a>
Continue reading (about 3 minutes)

A few extra system calls... and you lose 1% build time

Blaze—the variant of Bazel used internally at Google—was originally designed to build the Google monorepo. One of the beauties of sticking to a monorepo is code reuse, but this has the unfortunate side-effect of dependency bloat. As a result, Bazel and Blaze have evolved to support ever-increasingly-bigger pieces of software. The growth of the projects built by Bazel and Blaze has had the unsurprising consequence that our engineers all now have high-end workstations with access to massive amounts of distributed resources. And, as you can imagine, this has had an impact in the design of Blaze: many chunks of our codebase can—and do—assume that everyone has powerful hardware. These assumptions break down as soon as you move into Bazel’s open source land: while knowing where the product really runs is out of hand, we can safely assume it is certainly being used on slower hardware.

April 30, 2018 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/google">google</a>, <a href="/tags/monorepo">monorepo</a>, <a href="/tags/software">software</a>
Continue reading (about 4 minutes)

Preliminary sandboxfs support in Bazel

During the summer of last year, I hosted an intern who implemented sandboxfs: a FUSE-based file system that exposes an arbitrary view of the host’s file system under the mount point. At the end of his internship, we had a functional sandboxfs implementation and some draft patches for integration in Bazel. The goal of sandboxfs in the context of Bazel is to improve the performance of builds when action sandboxing is enabled. The way in which we try to do so is by replacing the costly process of setting up the file system for each action using symlinks with a file system that does so “instantaneously”.

April 13, 2018 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/google">google</a>, <a href="/tags/sandboxfs">sandboxfs</a>, <a href="/tags/software">software</a>
Continue reading (about 2 minutes)

Stick to your project's core language in your tests

This post is a short, generalized summary of the preceeding two. I believe those two posts put readers off due to their massive length and the fact that they were seemingly tied to Bazel and Java, thus failing to communicate the larger point I wanted to make. Let’s try to distill their key points here in a language- and project-agnostic manner.

March 27, 2018 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/featured">featured</a>, <a href="/tags/google">google</a>, <a href="/tags/software">software</a>
Continue reading (about 3 minutes)

A case for writing Bazel's integration tests in Java, part 2

In part 1 of this series, I made the case that you should run away from the shell when writing integration tests for your software and that you should embrace the primary language of your project to write those. Depending on the language you are using, doing this will mean significant more work upfront to lay out the foundations for your tests, but this work will pay off. You may also feel that the tests could be more verbose than if they were in shell, though that’s not necessarily the case.

March 19, 2018 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/google">google</a>, <a href="/tags/sandboxfs">sandboxfs</a>, <a href="/tags/software">software</a>
Continue reading (about 12 minutes)

A case for writing Bazel's integration tests in Java, part 1

My latest developer productivity rant thesis is that integration tests should be written in the exact same language as the thing they test. Specifically, not shell. This theory applies mostly to tests that verify infrastructure software like servers or command line tools. It is too easy to fall into the trap of using the shell because it feels like the natural choice to interact with tools. But I argue that this is a big mistake that hurts the long-term health of the project, and once trapped, it’s hard to escape.

March 16, 2018 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/google">google</a>, <a href="/tags/software">software</a>
Continue reading (about 14 minutes)

Introducing sandboxfs

sandboxfs is a FUSE-based file system that exposes an arbitrary view of the host’s file system under the mount point, and offers access controls that differ from those of the host. You can think of sandboxfs as an advanced version of bindfs (or mount --bind or mount_null(8) depending on your system) in which you can combine and nest directories under an arbitrary layout. The primary use case for this project is to provide a better file system sandboxing technique for the Bazel build system. The goal here is to run each build action (think compiler invocation) in a sandbox so that its inputs and outputs are tightly controlled, and sandboxfs attempts to do this in a more efficient manner than the current symlinks-based implementation.

August 25, 2017 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/pkg_comp">pkg_comp</a>, <a href="/tags/sandboxfs">sandboxfs</a>, <a href="/tags/software">software</a>, <a href="/tags/sourcachefs">sourcachefs</a>
Continue reading (about 2 minutes)

Joining the Blaze team

It has been over 6 years since I joined Google and throughout this time I have been in the Storage SRE family: first with GFS, then with Colossus, and last with Persistent Disk. Even though this counts as 3 different teams, the reality is that I have been doing mostly the same type of work all around. I had pondered the idea of switching to a pure Software Engineer (SWE) role for all these years and never taken any action. Until now. Things change, and the time has come for me to make a move and pursue that thought in an effort to grow in a different direction. And why now, you ask? Well, simply because I have found a role in the NYC office for a project that I am personally passionate about.

January 19, 2016 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/google">google</a>, <a href="/tags/work">work</a>
Continue reading (about 3 minutes)

On Bazel and Open Source

This is a rare post because I don’t usually talk about Google stuff here, and this post is about Bazel: a tool recently published by Google. Why? Because I love its internal counterpart, Blaze, and believe that Bazel has the potential to be one of the best build tools if it is not already. However, Bazel currently has some shortcomings to cater to a certain kind of important projects in the open source ecosystem: the projects that form the foundation of open source operating systems. This post is, exclusively, about this kind of project.

April 14, 2015 · Tags: <a href="/tags/bazel">bazel</a>, <a href="/tags/featured">featured</a>, <a href="/tags/google">google</a>, <a href="/tags/software">software</a>
Continue reading (about 17 minutes)