Showing 21 posts
When reviewing an incoming C++ PR last week, I left a comment along the lines: “merge local variable declaration with its initialization”. But why? Is this just a stylistic issue or is there something deeper to warrant making the change? Let’s look at stack frames, C, and then C++ to answer these questions.
The point of this post is simple and I’ll spoil it from the get go: every time you make an assumption in a piece of code, make such assumption explicit in the form of an assertion or error check. If you cannot do that (are you sure?), then write a detailed comment. In fact, I’m exceedingly convinced that the amount of assertion-like checks in a piece of code is a good indicator of the programmer’s expertise.
As most programming languages with support for functions, the shell offers locally-scoped variables. Unfortunately, local variables are not the default. You must explicitly declare variables as local and you should be very strict about doing this to prevent subtle but hard-to-diagnose bugs. That’s it! What else is there to say about this trivial keyword? As it turns out, more than you might think.
The shell supports defining functions, which, as we learned in the previous post, you should embrace and use. Unfortunately, they are fairly primitive and their use can, paradoxically, introduce other readability problems. One specific problem is that function parameters are numbered, not named, so the risk of cryptic code is high. Let’s see why this is a problem.
Our team develops Bazel, a Java-based tool. We do have, however, a significant amount of shell scripting. The percentage is small at only 3.6% of our codebase… but given the size of our project, that’s about 130,000 lines—a lot, really. Pretty much nobody likes writing these integration tests in shell. Leaving aside that our infrastructure is clunky, the real problem is that the team at large is not familiar with writing shell per se.
That’s it! After two months worth of posts, it is time to part with the the readability series. We have covered a lot of ground with these 14 posts: from mundane things such as blank lines and spelling to deeper topics involving dictionaries and global state. Before some parting words, here comes the full list of posts on this series. This is your time to catch up with any posts you have not yet read!
Try/catch blocks (or try/except, or whatever they happen to be named in your favorite language) are the mechanism by which you capture exceptions raised by a chunk of code in a controlled manner. Within that chunk of code, it does not matter which line throws the exception: any exception specified in the catch statement will be captured, and it is “impossible” to know at that point where it originated. Consider this piece of code:
Single assignment says that a variable should only be assigned a value once; i.e. a variable should only be initialized and never modified later. This could be said to be a property of functional programming—never mind that it’s used in compiler optimization a damn lot—so it may sound a bit out of scope in a readability post. Not really. When you force yourself to avoid modifying already-defined variables, you will need to find other ways to express your intent.
Conceptually, there are two kinds of conditional statements: ones that compute—or affect the computation of—a value and others that guard the execution of optional code (e.g. functionality enabled by a command-line flag). In this article we will focus on the former kind. One way to think of those conditionals is as if they were to represent functions, just written inline. In other words: while it should be possible to move the conditional statement to a separate function, it has not been done because the algorithm is simple enough to not warrant it.
In the most basic form, a conditional has two branches and the conditional expression inspects a single variable on entry. For example: if balance >= 0: color = BLACK else: color = RED In this little code snippet, the condition balance >= 0 only depends on one variable, and the two branches of the conditional cover all possible values of the input variable. I.e. the first branch covers all cases where the balance is positive or zero, and the second branch covers all cases where the balance is negative.
You know that passing state around different functions by using global variables is bad: it results in spaghetti code, it introduces side-effects to your functions and, well, is just bad practice. Then: don’t make the same mistake when using classes. The form this manifests in code is by having a particular class method (not necessarily the constructor!) initializing a member field and later having other unrelated methods querying the attribute “out of the blue”.
Yes: a dictionary is a data type. No: a dictionary is not a way to implement abstract data types; doing so is lazy programming and is asking for trouble later on. What do I mean by this? In Python and other similar dynamic languages, dictionaries are a mapping of keys to values that have no typing restrictions: the dictionary is heterogeneous, and a single dictionary can contain elements of different types both as its keys and its values.
Assertions are statements to ensure that a particular condition or state holds true at a certain point in the execution of a program. These checks are usually only performed in debug builds, which means that you must ensure that the expressions in the assertions are side-effect free. Because assertions are only validated in debug builds, you can abuse them to make your code more readable without impacting performance and without having to write a comment.
One of the best things you can do to improve the readability of your code is to avoid comments. “Uh, what?"—I hear you say—“That goes against all good coding guidelines, which state to heavily comment your code!” Right. Except that the real goal is to have valuable comments only, and doing so is a (very) complex matter. The code should document itself at all times, and be crystal-clear at doing so.
Wow. The previous post titled Self-interview after leaving the NetBSD board has turned out to be, by far, the most popular article in this blog. The feedback so far has been positive and I owe all of you a follow-up post. However, writing such post will take a while and content must keep flowing. So let’s get back to the readability series for now. In dynamically-typed languages1, variable and function definitions do not state the type of their arguments.
I can’t claim to be an expert writer—not in English, and not even in my native languages—but I do put a lot of attention to whatever I write: emails, presentations, letters and, of course, code. As it turns out, code has prose in it most of the time in the form of comments and documentation. So, when you write any text in your code, engrave this in mind: any obvious typos and/or any grammar mistake very quickly denote sloppiness in what you wrote.
When you are writing code, it is very tempting to shorten the names of functions and variables when the shortening seems blatantly obvious. All of us have, at some point, written calc_usage instead of calculate_usage; res instead of result; stmt instead of statement; ctx instead of context; etc. The thought goes “well, these abbreviations are obvious, and I’ll be a much faster developer!” Wrong. You might think you are faster at typing, but you don’t write code in one go and never ever get back to it again.
Vertical spacing is important for readability reasons: group together pieces of code that should not be split apart, and otherwise add blank lines among chunks of code that could be easily reordered and/or repurposed. That’s a pretty loose suggestion though, so let’s look at some specific situations in which you want to consider your vertical spacing practices (both adding and removing). Give some air to long functions Functions longer than a handful lines generally deserve some vertical spacing.
In a dynamically-typed language, it is common for the scoping semantics of a variable to be wider than a single code block. For example: in at least Python and the shell, it is the case that a variable defined anywhere within a function —even inside conditionals or loops— is reachable anywhere in the function from there on. To illustrate what this means, consider this snippet in which we define a function to compute the CPU requirements needed in a database system to support a set of tables:
Dear readers, This post is the beginning of a new series on the idioms and style that I use to keep code readable and obvious. I often get positive comments when undergoing peer reviews at Google and externally, so I am going to share such personal style in the hope that it might be useful to some of you. Explaining these ideas to coworkers when I review their code has proven to be useful in the long term.