While reviewing an incoming PR last week, I encountered a chunk of new code that looked similar to this:
{
HRESULT hr;
// Some code that doesn't touch hr.
hr = CallSomeFunction(...);
if (FAILED(hr)) {
// Handle error.
}
// More stuff here that may or may not update hr.
return hr;
}
… and one of my review comments was:
nit: Define
HRESULT
during initialization, likeHRESULT hr = CallSomeFunction(...)
.
But why?
Declaring all local variables at the beginning of a function was required in old versions of C—and the tradition continues to this day even after C started allowing declarations in the middle of a function with C99 (22 years ago!). But who cares? It’s “just a style”, isn’t it?
Well. The code snippet I showed above was actually from a C++ file, not C, and due to C++’s origins in C, this style has also propagated to C++ even when it was never required. And, in fact, adhering to this style in C++ can be harmful and should be avoided.
So the questions are: Why did C require variables to be defined upfront? Why is it a bad idea to stick to this style nowadays in C++ and in C? Wouldn’t a sufficiently-smart compiler do the code transformation suggested above for us and make this a non-issue?
To answer these questions, we’ll look at the constraints of the compiler back when C was invented in the 1970s and see why the above is a style issue in C but a more problematic issue in C++. I haven’t checked extensively if my guesses are the true reasons, but it seems that they are pretty accurate based on the findings of other people. And if they aren’t, I’m sure some of you will correct me.
Stack frames
Let’s start deep today. Let’s start by diving into how function calls work in C at the machine level. To make details specific, let’s focus on the Intel x86 32-bit architecture. I know this architecture wasn’t the driving factor during the design of C in the 1970s—it didn’t exist—but it will help illustrate the rationale anyway.
Consider a function myfunc
with parameters a
and b
and three local variables c
, d
and e
. These parameters and local variables are all 32-bit (4-byte) signed integers. The actual code of the function is not important, but its skeleton looks like this:
int myfunc(int a, int b) {
int c, d, e;
// Unimportant code goes here.
return c;
}
When we call this function with actual values, like with myfunc(7, 3)
, the compiler will generate code that looks like this:
push 3 ; Push second argument.
push 7 ; Push first argument.
call myfunc ; Result of myfunc is in eax.
add esp, 8 ; Remove 3 and 7 from the stack.
As you can see, the caller is responsible for pushing the arguments onto the stack (in reverse order), then calling the function, and then removing those arguments from the stack. The “removal” is done with an increment of the stack pointer register (esp
) by 8, which corresponds to the size of the two 4-byte integer arguments.
Once myfunc
gains control (that is, immediately after call
jumps into the function), the stack looks like this:
Address | Relative offset | Content | Value |
---|---|---|---|
0x00987654 | esp | Return address | 0x000050a4 |
0x00987658 | esp + 4 | Argument a | 7 |
0x0098765c | esp + 8 | Argument b | 3 |
0x00987660 | esp + 12 | ... | ... |
As expected, we have the return address and the two arguments at the top of the stack. esp
points to the last pushed value, and the table shows the offsets to the arguments relative to the current esp
.
Now, when myfunc
takes control, the first thing it does is prepare the stack to reserve space for all of its local variables. As part of this, and to take advantage of the way x86 relies on the base pointer (ebp
) for relative data addressing, the ebp
is configured to reference function arguments with positive offsets and local variables with negative offsets. And because ebp
is modified, it must be saved first. In essence, and except for optimizations (like the use of the leave
instruction or register allocation), every function looks like this:
myfunc:
push ebp ; Save the caller's ebp.
mov ebp, esp ; Set ebp (base pointer) to the current esp.
sub esp, 12 ; Allocate 4x3 bytes on the stack for the local variables.
; Uninteresting code of the function.
mov esp, ebp ; Remove all local variables from the stack.
pop ebp ; Restore the caller's ebp.
ret
In diagram form, the contents of the stack will look like this when the code of the function starts running:
Address | Relative offset | Content | Value |
---|---|---|---|
0x00987644 | ebp - 12 | Local e | Unknown |
0x00987648 | ebp - 8 | Local d | Unknown |
0x0098764c | ebp - 4 | Local c | Unknown |
0x00987650 | ebp | Caller's ebp | Anything |
0x00987654 | ebp + 4 | Return address | 0x000050a4 |
0x00987658 | ebp + 8 | Argument a | 7 |
0x0098765c | ebp + 12 | Argument b | 3 |
0x00987660 | ebp + 16 | ... | ... |
The above table depicts the stack frame of myfunc
. The stack frame is the part of the stack that contains the arguments passed to the function and the local variables reserved by the function itself. In other words: the stack frame contains the information needed to execute the function and to return to the caller.
IMPORTANT: The key detail to note in the code above is the order in which operations happen: the function starts by reserving space for all the local variables it needs and then proceeds to do whatever it needs.
So the question is: when the compiler is generating code for the function, how does it know how much space to reserve for its local variables? To know how much space to reserve, the compiler needs to know what the local variables are and what their sizes are. And to do this, the compiler has to scan through the function’s code to spot the local variables.
This might seem like an easy problem to solve after we have interned the source code as an AST, but we have to frame this discussion in the 1970s when C was first invented.
If the language requires local variable declarations to appear before the code, as early versions of C did up until C99, then the compiler can compute their size and their ebp
-relative addresses while reading the code—in a single pass. By the time the compiler reaches the code of the function, it already knows the total amount of space needed, so it can emit the stack frame setup instructions and can start interpreting and generating other code.
And that’s likely why variables had to be defined upfront. The process described above is much easier to implement, especially when framed in the context of the 1970s when the language was first invented.
Finally, I glanced over a little detail in the stack frame diagram above (those Unknown labels in the Values column), but it’s also critical for our discussion:
IMPORTANT: Note how reserving space for the local variables is a simple decrement of the stack pointer (sub esp, 12
). This is sufficient because C does not zero-initialize variables.
Once again, this language implementation choice made a lot of sense in the 1970s because doing zero-initialization would have been costly and not competitive against hand-written assembly code.
C++ variable declarations
Enough about C. Let’s now shift to the more complex C++. But, as you know, C++ is built on C, so a lot of the discussion above carries over to C++. In particular, like C, C++ does not zero-initialize variables by default—except for classes with constructors (and for other modern initialization styles, which I’ll ignore). Here is where things get tricky.
Let’s look at some code. For our discussion, we’ll use two classes: WithoutCtor
, which is a class that does not provide a default constructor; and WithCtor
which does. These classes look like this:
class WithoutCtor {
public:
int field;
};
class WithCtor {
int field;
public:
WithCtor(void) : field(0) {}
explicit WithCtor(int i) : field(i) {}
};
Let’s now create some instances of these classes, C-style:
void foo() {
int a;
WithoutCtor b;
WithCtor c;
// Some code.
}
This harmless chunk of code behaves differently than what we saw earlier for C. By the time we reach foo
’s code block, we have that a
, b
, and even b.field
are all uninitialized. But c
is initialized and c.field
is zero, which was initialized via an implicit function call to the default WithCtor
’s constructor. That’s actual work that had to happen at run time, so defining local variables like this may or may not be a simple stack decrement any more.
Now let’s say we add local variable initializations immediately after their declarations, like you typically see in C code:
void foo() {
int a;
WithoutCtor b;
WithCtor c;
a = 5;
b.field = 5;
c = WithCtor(5);
// Some code.
}
In this case, by the time we reach the function’s code, all of a
, b
, and c
have been initialized. Which is great… except that c
was initialized twice via two separate function calls. Remember that the bare WithCtor c
declaration had to call the default constructor, and then we have an explicit WithCtor(5)
constructor call. Not cheap, I’d say.
Side-effects and translation units
But wait a minute. Shouldn’t the compiler be smart enough to realize that, in the snippet above, the c
initialized with the default constructor was never read and so the call to that constructor is unnecessary? Cannot it be optimized away and leave us with super-duper-efficient code, turning this problem into a style issue only? Maybe, but not always, and probably only occasionally.
The reason is side-effects. Consider this updated version of WithCtor
:
class WithCtor {
int field;
public:
WithCtor() : field(0) {
std::cout << "Default constructor\n";
}
explicit WithCtor(int i) : field(i) {}
std::cout << "Explicit constructor\n";
}
};
void foo() {
WithCtor a;
a = WithCtor(3);
}
In this piece of code, the compiler cannot omit the call to WithCtor
’s default constructor: it must still call it because that constructor has side-effects—namely, it prints a message to the screen.
And you might say: well, don’t do that. Constructors should only assign local variables and not do any work (which is great advice, by the way). If you follow that pattern, then the compiler will probably still figure out it’s doing useless work and optimize it away.
Except that.. the compiler may not get to see the contents of the constructor at all. If the implementation of WithCtor
’s constructors is in a different translation unit (.cpp
file) than its caller foo
, then the compiler has no clue about what side-effects those constructors might have, so it must still call them. Sure, you might try to declare these constructors pure (using non-standard attributes) but… who does that? And even then, I wouldn’t trust the compiler to always optimize the calls out.
Readability
But there is one more argument against the practice of separating declarations from initialization, and this argument is about readability—thus it applies to any language really.
It is well-known that the human mind can keep track of only about 7 things at once. Every variable that appears in a function is an actor that your mind must keep track of, and the fewer actors there are, the easier it is to make sense of what’s going on. (Even the Linux kernel style guide says so.)
If you write code like this:
int a, b, c, d, e;
// 20-30 lines of code that modify a, b, d, e but NOT c.
c = a + b + d + e;
// Something else.
Then the reader’s mind must keep track of all 5 variables for the duration of the whole function. It doesn’t matter that the c
variable is never touched for 20-30 lines: the reader must still be aware of it while reading the code just in case it shows up.
In contrast, if you do this:
int a, b, d, e;
// 20-30 lines of code that modify a, b, d and e.
int c = a + b + d + e;
// Something else.
Then it’s painfully clear that c
is never touched in the first half of the code above simply because it doesn’t exist. Well, actually, c
may have been allocated on the stack from the very beginning of the function, so from the compiler’s point of view the variable exists and the above makes no difference. But, to the reader, it does make a difference.
And as a smaller benefit: merging declarations with initializations can shorten the total lines of your code. The shorter the code, the more of it you can see at once, which also helps navigating large files.
As a side note, that’s why declaring everything you can as const
is helpful too. You might argue that the compiler doesn’t need that information, but humans do: by declaring variables as const
, you are telling their minds that the variables won’t change, so they won’t have to scan for other possible assignments in the sea of code.
In conclusion
Circling back to the original example. Was my comment about merging the HRESULT
definition with its initialization important?
Not really, because HRESULT
is a primitive integer type and thus it’s never zero-initialized: keeping the definition separate from the initialization has no implications on the machine code. However, the change is still worthwhile for readability reasons, and that’s why I marked the comment as a nit.