Showing 2 posts
About two weeks ago, I found a very interesting bug in Bazel’s test output streaming functionality while writing tests for a new feature related to Ctrl+C interrupts. I fixed the bug, wrote a test for it, and… the test itself came back as flaky, which made me find another very subtle bug in the test that needed a one-line fix. This is the story of both. Bazel has a feature known as test output streaming: by default, Bazel captures the outputs (stdout and stderr) of the tests it runs, saves those in local log files, and tells the user where they are when a test fails.
About a month ago, I was benchmarking the impact of a new Bazel feature and I noticed that a test build that should have taken only a few seconds took almost 10 minutes. My Internet connection was flaking out indeed, but something else didn’t seem right. So I looked and found that Bazel was doing network calls within a critical section, and these were the root cause behind the massive slowdown. But how did we get such an obvious no-no into the codebase? Read on to see how this happened and how gnarly it was to fix!