Definitions and initializations in C++

When reviewing an incoming C++ PR last week, I left a comment along the lines: “merge local variable declaration with its initialization”. But why? Is this just a stylistic issue or is there something deeper to warrant making the change? Let’s look at stack frames, C, and then C++ to answer these questions.

July 12, 2021 · Tags: c, c++, readability
Continue reading (about 11 minutes)

Unused parameters in C and C++

Today I would like to dive into the topic of unused parameters in C and C++: why they may happen and how to properly deal with them—because smart compilers will warn you about their presence should you enable -Wunused-parameter or -Wextra, and even error out if you are brave enough to use -Werror.

Why may unused parameters appear?

You would think that unused parameters should never exist: if the parameter is not necessary as an input, it should not be there in the first place! That’s a pretty good argument, but it does not hold when polymorphism enters the picture: if you want to have different implementations of a single API, such API will have to provide, on input, a superset of all the data required by all the possible implementations.

February 16, 2015 · Tags: c, cxx
Continue reading (about 6 minutes)

Using va_copy to safely pass ap arguments around

Update (2014-12-19): The advice provided in this blog post is questionable and, in fact, probably incorrect. The bug described below must have happened for some unrelated reason (like, maybe, reuse of ap), but at this point (three years later!) I do not really remember what was going on here nor have much interest in retrying.

A long time ago, while I was preparing an ATF release, I faced many failing tests and crashes in one of the platforms under test. My memory told me this was a problem in OpenSolaris, but the repository logs say that the problem really happened in Fedora 8 x86_64.
The problem manifested itself as segmentation faults pretty much everywhere, and I could trace such crashes down to pieces of code like the following, of which the C code of ATF is full of:

void
foo_fmt(const char *fmt, ...)
{
    va_list ap;

    va_start(ap, fmt);
    foo_ap(fmt, ap);
    va_end(ap);
}

void
foo_ap(const char *fmt, va_list ap)
{
    char buf[128];

    vsnprintf(buf, sizeof(buf), fmt, ap);

    ... now, do something with buf ...
}

The codebase of ATF provides _fmt and _ap variants for many functions to give more flexibility to the caller and, as shown above, the _fmt variant just relies on the _ap variant to do the real work.
Now, the crashes that appeared from the code above seemed to come from the call that consumes the ap argument, which in this case is vsnprintf. Interestingly, though, all the tests in other platforms but Linux x86_64 worked just fine, and this included OpenSolaris, other Linux distributions, some BSDs and even different hardware platforms.
As it turned out, you cannot blindly pass ap arguments around because they are not "normal" parameters (even though, unfortunately, they look like so!). In most platforms, the ap element will be just an "absolute" pointer to the stack, so passing the variable to an inner function calls is fine because the caller's stack has not been destroyed yet and, therefore, the pointer is still valid. But... the ap argument can have other representations. It'd be an offset to the stack instead of a pointer, or it'd be a data structure that holds all the variable parameters. If, for example, the ap argument held an offset, passing it to an inner function call would make such offset point to "garbage" because the stack would have been grown due to the new call frame. (I haven't investigated what specific representation is x86_64 using.)
The solution is to use the va_copy function to generate a new ap object that is valid for the current stack frame. This is easy, so as an example, we have to rewrite the foo_ap function above as follows:

void
foo_ap(const char *fmt, va_list ap)
{
    char buf[128];
    va_list ap2;

    va_copy(ap2, ap);
    vsnprintf(buf, sizeof(buf), fmt, ap2);
    va_end(ap2);

    ... now, do something with buf ...
}

This duplication of the ap argument pointing to the variable list of arguments ensures that ap2 can be safely used from the new stack frame.

September 12, 2011 · Tags: c, portability
Continue reading (about 3 minutes)

Validating format strings in custom C functions

In C, particularly due to the lack of dynamic strings, it's common to pass format strings around together with a variable set of arguments. A prototype like this is very common:

void my_printf(const char*, ...);

For the standard printf and similar functions, some compilers will ensure that the variable list of arguments matches the positional parameters in the format string and, if they don't match, raise a warning. This is, however, just a warning "hardcoded" to match these functions, as the compiler can't know how the variable arguments of our custom my_printf function relate to the first argument.

Or can it?

I was made aware of a nice GCC attribute that allows developers to tag printf-like functions in a manner that allows the compiler to perform the same validation of variable arguments and format strings. This is in the form of a GCC __attribute__ that also happens to work with CLang. Let's see an example to illustrate how this works:

#include <stdarg.h>
#include <stdio.h>

static void my_printf(const char*, ...)
    __attribute__((format(printf, 1, 2)));

static void
my_printf(const char* format, ...)
{
    va_list ap;

    printf("Custom printf: ");
    va_start(ap, format);
    vprintf(format, ap);
    va_end(ap);
}

int
main(void)
{
    my_printf("this is valid %dn", 3);
    my_printf("but this is not %fn", 3);
}

If we compile the code above:

$ clang example.c
example.c:22:33: warning: conversion specifies type 'double' but
the argument has type 'int' [-Wformat]
    my_printf("but this is not %fn", 3);
                               ~^     ~
1 warning generated.

Very useful. This function attribute has been applied to many functions in the NetBSD tree and many bugs have been spotted thanks to it.

Instead of me explaining how the format attribute works, I'll refer you to the official documentation. The attribute recognizes several format styles and takes different arguments depending on them, so it is a bit tricky to explain. Plus, if you look at the extensive list of attributes, you may find some useful stuff ;-)

Happy debugging!

June 17, 2011 · Tags: c
Continue reading (about 2 minutes)

Use explicit conditionals

In C — or, for that matter, several other languages such as Python or C++ — most native types can be coerced to a boolean type: expressions that deliver integers, pointers or characters are automatically treated as boolean values whenever needed. For example: non-zero integer expression and non-NULL pointers evaluate to true whereas zero or NULL evaluate to false.

Many programmers take advantage of this fact by stating their conditionals like this:

void func(const int in) {
    if (in) {
        ... do something when in != 0 ...
    } else {
        ... do something else when in == 0 ...
    }
}

... or even do something like:

bool func(const struct mystruct *ptr) {
    int out = calculate_out(in);

    ... do something more with out ...

    return out;  // The return type is bool though; is this ok?
}

... but such idioms are sloppy and can introduce confusion or bugs. Yes, things work in many cases, but "work" is not a synonym for "good" ;-)

Taking advantage of coercions (automatic type conversions) typically makes the code harder to follow. In statically-typed languages, whenever you define a variable to be of a particular type, you should adhere to said type by all means so that no confusion is possible. Conceptually, an integer is not a boolean and therefore it should not be treated as such. But this is just a matter of style.

On the other hand, the automatic conversions can lead to hidden bugs; for example, in the second function above, did we really want to convert the "out" value to a boolean or was that a mistake? This is not so much a matter of style but a matter of careful programming, and having as much state as possible "in your face" can help prevent these kind of trivial errors.

In contrast, one would argue that having to provide explicit checks or casts on every expression is a waste of time. But keep in mind that code is supposed to be written once and read many times. Anything you can do to communicate your intent to the reader will surely be appreciated.

Disclaimer: the above is my personal opinion only. Not following the style above should have no implications on the resulting binary code. (Wow, this post had been sitting as a draft around here for a too long time.)

April 30, 2011 · Tags: c
Continue reading (about 2 minutes)

Error handling in Lua

Some of the methods of the Lua C API can raise errors. To get an initial idea on what these are, take a look at the Functions and Types section and pay attention to the third field of a function description (the one denoted by 'x' in the introduction).

Dealing with the errors raised by these functions is tricky, not to say a nightmare. Also, the ridiculously-short documentation on this topic does not help. This post is dedicated to explain how these errors may be handled along with the advantages and disadvantages of each case.

The Lua C API provides two modes of execution: protected and unprotected. When in protected mode, all errors caused by Lua are caught and reported to the caller in a controlled manner. When in unprotected mode, the errors just abort the execution of the calling process by default. So, one would think: just run the code in protected mode, right? Yeah, well... entering protected mode is nontrivial and it has its own particularities that make interaction with C++ problematic.

Let's analyze error reporting by considering a simple example: the lua_gettable function. The following Lua code would error out when executed:

my_array = nil
return my_array["test"]

... which is obvious because indexing a non-table object is a mistake. Now let's consider how this code would look like in C (modulo the my_array assignment):

lua_getglobal(state, "my_array");
lua_pushstring(state, "test");
lua_gettable(state, -2);

Simple, huh? Sure, but as it turns out, any of the API calls (not just lua_gettable) in this code can raise errors (I'll call them unsafe functions). What this means is that, unless you run the code with a lua_pcall wrapper, your program will simply exit in the face of a Lua error. Uh, your scripting language can "crash" your host program out of your control? Not nice.

What would be nice is if each of the Lua C API unsafe functions reported an error (as a return value or whatever) and allowed the caller to decide what to do. Ideally, no state would change in the face of an error. Unfortunately, that is not the case but it is exactly what I would like to do. I am writing a C++ wrapper for Lua in the context of Kyua and fine granularity in error reporting means that automatic cleanup of resources managed by RAII is trivial.

Let's analyze the options that we have to control errors caused within the Lua C API. I will explain in a later post the one I have chosen for the wrapper in Kyua (it has to be later because I'm not settled yet!).

Install a panic handler

Whenever Lua code runs in an unprotected environment, one can use lua_atpanic to install a handler for errors. The function provided by the user is executed when the error occurs and, if the panic function returns, the program exits. To prevent exiting prematurely, one could opt for two mechanisms:

Make the panic handler raise a C++ exception. Sounds nice, right? Well, it does not work. The Lua library is generally built as a C binary which means that our panic handler will be called from within a C environment. As a result, we cannot throw an exception from our C++ handler and expect things to work: the exception won't propagate correctly from a C++ context to a C context and then back to C++. Most likely, the program will abort as soon as we leave the C++ world and enter C to unwind the stack.
Use setjmp before the call to the unsafe Lua function and recover with longjmp from within the panic handler. It turns out that this does work but with one important caveat: the stack is completely cleared before the call to the panic handler. As a result, this prevents the requirement of "leave the stack unmodified on failure" as is desired of any function (report errors early before changing state).

Run every single call in a protected environment

This is doable but complex and not completely right: to do this, we need to write a C wrapper function for every unsafe API function and run it with lua_pcall. The overhead of this approach is significant: something as simple as a call to lua_gettable turns into several stack manipulation operations, a call to lua_pcall and then further stack modifications to adjust the results.

Additionally, in order to prepare the call to lua_pcall, one has to use the multiple lua_push* functions to prepare the stack for the call. And, guess what, most of these functions that push values onto the stack can themselves fail. So... in order to prepare the environment for a safe call, we are already executing unsafe calls. (Granted, the errors in these case are only due to memory exhaustion... but still, the solution is not fully robust.)

Lastly, note that we cannot use lua_cpcall because it does discard all return values of the executed function. Which means that we can't really wrap single Lua operations. (We could wrap a whole algorithm though.)

Run the whole algorithm in a protected environment

This defeats the whole purpose of the per-function wrapping. We would need to provide a separate C/C++ function that runs all unsafe code and then call it by means of lua_pcall (or lua_cpcall) so that errors are captured and reported in a controlled manner. This seems very efficient... albeit not transparent and will surely cause issues.

Why is this problematic? Errors that happen inside the protected environment are managed by means of a longjmp. If the code wrapped by lua_pcall is a C++ function, it can instantiate objects. These objects have destructors. A longjmp outside of the function means that no destructors will run... so objects will leak memory, file descriptors, and anything you can imagine. Doom's day.

Yes, I know Lua can be rebuilt to report internal errors by means of exceptions which would make this particular problem a non-issue... but this rules out any pre-packaged Lua binaries (the default is to use longjmp and henceforth what packaged binaries use). I do not want to embed Lua into my source tree. I want to use Lua binary packages shipped with pretty much any OS (hey, including NetBSD!), which means that my code needs to be able to cope with Lua binaries that use setjmp/longjmp internally.

Closing remarks

I hope the above description makes any sense because I had to omit many, many details in order to make the post reasonably short. It could also be that there are other alternatives I have not considered, in which case I'd love to know them. Trying to find a solution to the above problem has already sucked several days of my free time, which translates in Kyua not seeing any further development until a solution is found!

January 7, 2011 · Tags: c, cxx, lua
Continue reading (about 6 minutes)

Understanding setjmp/longjmp

For a long time, I have been aware of the existence of the standard C functions setjmp and longjmp and that they can be used to simulate exceptions in C code. However, it wasn't until yesterday that I had to use them... and it was not trivial. The documentation for these functions tends to be confusing, and understanding them required looking for additional documents and a bit of experimentation. Let's see if this post helps in clarifying how these functions work.

The first call to setjmp causes the process state (stack, CPU registers, etc.) to be saved in the provided jmp_buf structure and, then, a value of 0 to be returned. A subsequent call to longjmp with the same jmp_buf structure causes the process to go "back in time" to the state stored in said structure. The way this is useful is that, when going back in time, we tweak the return value of the setjmp call so we can actually run a second (or third or more) path as if nothing had happened.

Let's see an example:

#include <setjmp.h>
#include <stdio.h>
#include <stdlib.h>

static jmp_buf buf;

static void
myfunc(void)
{
   printf("In the function.n");

   ... do some complex stuff ...

   /* Go back in time: restore the execution context of setjmp
    * but make the call return 1 instead of 0. */
   longjmp(buf, 1);

   printf("Not reached.n");
}

int
main(void) {
   if (setjmp(buf) == 0) {
       /* Try block. */
       printf("Trying some function that may throw.n");
       myfunc();
       printf("Not reached.n");
   } else {
       /* Catch block. */
       printf("Exception caught.n");
   }
   return EXIT_SUCCESS;
}

The example above shows the following when executed:

Trying some function that may throw.
In the function.
Exception caught.

So, what happened above? The code starts by calling setjmp to record the execution state and the call returns 0, which causes the first part of the conditional to run. You can think of this clause as the "try" part of an exception-based code. At some point during the execution of myfunc, an error is detected and is "thrown" by a call to longjmp and a value of 1. This causes the process to go back to the execution of setjmp but this time the call returns 1, which causes the second part of the conditional to run. You can think of this second clause as the "catch" part of an exception-based code.

It is still unclear to me what the "execution context" stored in jmp_buf is: the documentation does not explain what kind of resources are correctly unwinded when the call to longjmp is made... which makes me wary of using this technique for exception-like handling purposes. Oh, and this is even less clear in the context of C++ code and, e.g. calls to destructors. Would be nice to expand the description of these APIs in the manual pages.

January 2, 2011 · Tags: c
Continue reading (about 3 minutes)

Using RAII to clean up temporary values from a stack

For the last couple of days, I have been playing around with the Lua C API and have been writing a thin wrapper library for C++. The main purpose of this auxiliary library is to ensure that global interpreter resources such as the global state or the execution stack are kept consistent in the presence of exceptions — and, in particular, that none of these are leaked due to programming mistakes when handling error codes.

To illustrate this point, let's forget about Lua and consider a simpler case. Suppose we lost the ability to pass arguments and return values from functions in C++ and all we have is a stack that we pass around. With this in mind, we could implement a multiply function as follows:

void multiply(std::stack< int >& context) {
    const int arg1 = context.top();
    context.pop();
    const int arg2 = context.top();
    context.pop();
    context.push(arg1 * arg2);
}

And we could call our function as this:

std::stack< int > context;
context.push(5);
context.push(6);
multiply(context);
const int result = s.top();
s.pop();

In fact, my friends, this is more-or-less what your C/C++ compiler is internally doing when converting code to assembly language. The way the stack is organized to perform calls is known as the calling conventions of an ABI (language/platform combination).

Anyway, back to our point. One important property of such a stack-based system is that any function that deals with the stack must leave it in a consistent state: if the function pushes temporary values (read: local variables) into the stack, such temporary values must be gone upon return no matter how the function terminates. Otherwise, the caller will not find the stack as it expects, which will surely cause trouble at a later stage. The above example works just fine because our function is extremely simple and does not put anything on the stack.

But things get messier when our functions can fail halfway through, and, in particular, if such failures are signaled by exceptions. In these cases, the function will abort abruptly and the function must take care to clean up any values that may still be left on the stack. Let's consider another example:

void magic(std::stack< int >& context) {
    const int arg1 = context.top();
    context.pop();
    const int arg2 = context.top();
    context.pop();

    context.push(arg1 * arg2);
    context.push(arg1 / arg2);
    try {
        ... do something with the two values on top ...

        context.push(arg1 - arg2);
        try {
            ... do something with the three values on top ...
        } catch (...) {
            context.pop();  // arg1 - arg2
            throw;
        }
        context.pop();
    } catch (...) {
        context.pop();  // arg1 / arg2
        context.pop();  // arg1 * arg2
        throw;
    }
    context.pop();
    context.pop();
}

The above is a completely fictitious and useless function, but serves to illustrate the point. magic() starts by pushing two values on the stack and then performs some computation that reads these two values. It later pushes an additional value and does some more computations on the three temporary values that are on the top of the stack.

The "problem" is that the computation code can throw an exception. If it does, we must sanitize the stack to remove the two or three values we have already pushed. Otherwise, the caller will receive the exception, it will assume nothing has happened, and will leak values on the stack (bad thing). To prevent this, we have added a couple of try/catch clauses to capture these possible exceptions and to clean up the already-pushed values before exiting the function. Unfortunately, this gets old very quickly: having to add try/catch statements surrounding every call is boring, ugly, and hard to read (remember that, potentially, any statement can throw an exception). You can see this in the example above with the two nested try/catch blocks.

To mitigate this situation, we can apply a RAII-like technique to make popping elements on errors completely transparent and automated. If we can make it transparent, writing the code is easier and reading it is trivial; if we can make it automated, we can be certain that our error paths (rarely tested!) correctly clean up any global state. In C++, destructors are deterministically executed whenever a variable goes out of scope, so we can use this to our advantage to clean up temporary values. Let's consider this class:

class temp_stack {
    std::stack< int >& _stack;
    int _pop_count;

public:
    temp_stack(std::stack< int >& stack_) :
        _stack(stack_), _pop_count(0) {}

    ~temp_stack(void)
    {
        while (_pop_count-- > 0)
            _stack.pop();
    }

    void push(int i)
    {
        _stack.push(i);
        _pop_count++;
    }
};

With this, we can rewrite our function as:

void magic(std::stack< int >& context) {
    const int arg1 = context.top();
    context.pop();
    const int arg2 = context.top();
    context.pop();

    temp_stack temp(context);

    temp_stack.push(arg1 * arg2);
    temp_stack.push(arg1 / arg2);
    ... do something with the two values on top ...

    temp_stack.push(arg1 - arg2);
    ... do something with the three values on top ...

    // Yes, we can return now.  No need to do manual pop()s!
}

Simple, huh? Our temp_stack function keeps track of how many elements have been pushed on the stack. Whenever the function terminates, be it due to reaching the end of the body or due to an exception thrown anywhere, the temp_stack destructor will remove all elements previously registered from the stack. This ensures that the function leaves the global state (the stack) as it was on entry — modulo the function parameters consumed as part of the calling conventions.

So how does all this play together with Lua? Well, Lua maintains a stack to communicate parameters and return values between C and Lua. Such stack can be managed in a similar way with a RAII class, which makes it very easy to write native functions that deal with the stack and clean it up correctly in all cases. I would like to show you some non-fictitious code right now, but it's not ready yet ;-) But when it is, it will be part of Kyua. Stay tuned!

And, to conclude: to make C++ code robust, wrap objects that need manual clean up (pointers, file descriptors, etc.) with small wrapper classes that perform such clean up on destruction. These classes are typically fully inlined and contain a single member field, so they do not impose any performance penalty. But, on the contrary, your code can avoid the need of many try/catch blocks, which are tricky to get right and hard to validate. (Unfortunately, this technique cannot be applied in, e.g. Java or Python, because the execution of the class destructors is completely non-deterministic and not guaranteed to happen whatsoever!)

December 27, 2010 · Tags: c, cxx, kyua, lua
Continue reading (about 6 minutes)

Child-process management in C for ATF

Let's face it: spawning child processes in Unix is a "mess". Yes, the interfaces involved (fork, wait, pipe) are really elegant and easy to understand, but every single time you need to spawn a new child process to, later on, execute a random command, you have to write quite a bunch of error-prone code to cope with it. If you have ever used any other programming language with higher-level abstraction layers — just check Python's subprocess.Popen — you surely understand what I mean.

The current code in ATF has many places were child processes have to be spawned. I recently had to add yet another case of this, and... enough was enough. Since then, I've been working on a C API to spawn child processes from within ATF's internals and just pushed it to the repository. It's still fairly incomplete, but with minor tweaks, it'll keep all the dirty details of process management contained in a single, one-day-to-be-portable module.

The interface tries to mimic the one that was designed on my Boost.Process Summer of Code project, but in C, which is quite painful. The main idea is to have a fork function to which you pass the subroutine you want to run on the child, the behavior you want for the stdout stream and the behavior you want for the stderr steam. These behaviors can be any of capture (aka create pipes for IPC communcations), silence (aka redirect to /dev/null), redirect to file descriptor and redirect to file. For simplicity, I've omitted stdin. With all this information, the fork function returns you an opaque structure representing the child, from which you can obtain the IPC channels if you requested them and on which you can wait for finalization.

Here is a little example, with tons of details such as error handling or resource finalization removed for simplicity. The code below would spawn "/bin/ls" and store its output in two files named ls.out and ls.err:

static
atf_error_t
run_ls(const void *v)
{
 system("/bin/ls");
 return atf_no_error();
}

static
void
some_function(...)
{
 atf_process_stream_t outsb, errsb;
 atf_process_child_t child;
 atf_process_status_t status;

 atf_process_status_init_redirect_path(&outsb, "ls.out");
 atf_process_status_init_redirect_path(&errsb, "ls.err");

 atf_process_fork(&child, run_ls, &outsb, &errsb, NULL);
 ... yeah, here comes the concurrency! ...
 atf_process_child_wait(&child, &status);

 if (atf_process_status_exited(&status))
     printf("Exit: %dn", atf_process_status_exitstatus(&status));
 else
     printf("Error!");
}

Yeah, quite verbose, huh? Well, it's the price to pay to simulate namespaces and similar other things in C. I'm not too happy with the interface yet, though, because I've already encountered a few gotchas when trying to convert some of the existing old fork calls to the new module. But, should you want to check the whole mess, check out the corresponding revision.

June 21, 2009 · Tags: atf, boost-process, c
Continue reading (about 3 minutes)

Making ATF 'compiler-aware'

For a long time, ATF has shipped with build-time tests for its own header files to ensure that these files are self-contained and can be included from other sources without having to manually pull in obscure dependencies. However, the way I wrote these tests was a hack since the first day: I use automake to generate a temporary library that builds small source files, each one including one of the public header files. This approach works but has two drawbacks. First, if you do not have the source tree, you cannot reproduce these tests -- and one of ATF's major features is the ability to install tests and reproduce them even if you install from binaries, remember? And second, it's not reusable: I now find myself needing to do this exact same thing in another project... what if I could just use ATF for it?

Even if the above were not an issue, build-time checks are a nice thing to have in virtually every project that installs libraries. You need to make sure that the installed library is linkable to new source code and, currently, there is no easy way to do this. As a matter of fact, the NetBSD tree has such tests and they haven't been migrated to ATF for a reason.

I'm trying to implement this in ATF at the moment. However, running the compiler in a transparent way is a tricky thing. Which compiler do you execute? Which flags do you need to pass? How do you provide a portable-enough interface for the callers?

The approach I have in mind involves caching the same compiler and flags used to build ATF itself and using those as defaults anywhere ATF needs to run the compiler itself. Then, make ATF provide some helper check functions that call the compiler for specific purposes and hide all the required logic inside them. That should work, I expect. Any better ideas?

March 5, 2009 · Tags: atf, c, cxx
Continue reading (about 2 minutes)

ATF's error handling in C

One of the things I miss a lot when writing the C-only code bits of ATF is an easy way to raise and handle errors. In C++, the normal control flow of the execution is not disturbed by error handling because any part of the code is free to notify error conditions by means of exceptions. Unfortunately, C has no such mechanism, so errors must be handled explicitly.

At the very beginning I just made functions return integers indicating error codes and reusing the standard error codes of the C library. However, that turned out to be too simple for my needs and, depending on the return value of a function (not an integer), was not easily applicable.

What I ended up doing was defining a new type, atf_error_t, which must be returned by all functions that can raise errors. This type is a pointer to a memory region that can vary in contents (and size) depending on the error raised by the code. For example, if the error comes from libc, I mux the original error code and an informative message into the error type so that the original, non-mangled information is available to the caller; or, if the error is caused by the user's misuse of the application, I simply return a string that contains the reason for the failure. The error structure contains a type field that the receiver can query to know which specific information is available and, based on that, cast down the structure to the specific type that contains detailed information. Yes, this is very similar to how you work with exceptions.

In the case of no errors, a null pointer is returned. This way checking for an error condition is just a simple pointer check, which is no more expensive than an integer check. However, handling error conditions is more costly, but given that these are rare, it is certainly not a problem.

What I don't like too much of this approach is that any other return value must be returned as an output parameter, which makes things a bit confusing. Furthermore, robust code ends up cluttered with error checks all around given that virtually any call to the library can produce an error somewhere. This, together with the lack of RAII modeling, complicates error handling a lot. But I can't think of any other way that could be simpler but, at the same time, as flexible as this one. Ideas? :P

More details are available in the atf-c/error.h and atf-c/error.c files.

February 24, 2008 · Tags: atf, c
Continue reading (about 2 minutes)

Rewriting parts of ATF in C

I have spent part of past week and this whole weekend working on a C-only library for ATF test programs. An extremely exhausting task. However, I wanted to do it because there is reluctancy in NetBSD to write test programs in C++, which is understandable, and delaying it more would have made things worse in the future. I found this situation myself some days ago when writing tests for very low level stuff; using C++ there felt clunky, but it was still possible of course.

I have had to reimplement lots of stuff that are given for-free in any other, higher-level (not necessarily high-level) language. This includes, for example, a "class" to deal with dynamic strings, another one for dynamic linked lists and iterators, a way to propagate errors until the point where they can be managed... and I have spent quite a bit of time debugging crashes due to memory management bugs, something that I rarely encountered in the C++ version.

However, the new interface is, I believe, quite neat. This is not because of the language per se, but because the C++ interface has grown "incorrectly". It was the first code in the project and it shows. The C version has been written from the ground up with all the requirements known beforehand, so it is cleaner. This will surely help in cleaning up the C++ version later on, which cannot die anyway.

The code for this interface is in a new branch, org.NetBSD.atf.src.c, and will hopefully make it to ATF 0.5: it still lacks a lot of features, hence why it is not on mainline. Ah, the joys of a distributed VCS: I have been able to develop this experiment locally and privately until it was decent enough to be published, and now it is online with all history available!

From now on C++ use will be restricted to the ATF tools inside ATF itself, and to those users who want to use it in their projects. Test cases will be written using the C library except for those that unit-test C++ code.

February 18, 2008 · Tags: atf, c
Continue reading (about 2 minutes)

Is assembly code faster than C?

I was reading an article the other day and found an assertion that bugged me. It reads:

System 6.0.8 is not only a lot more compact since it has far fewer (mostly useless) features and therefore less code to process, but also because it was written in assembly code instead of the higher level language C. The lower the level of the code language, the less processing cycles are required to get something done.

It is not the first time I see someone claiming that writing programs in assembly by hand makes them faster, and I'm sure it is not the last time I'll see this. This assertion is, simply put, wrong.

Back in the (good?) old days, processors were very simple: they fetched a instruction from main memory, executed it and once finished (and only then), they fetched the next instruction and repeated the process. On the other hand, compilers were very primitive and their optimization engines were, I dare to say, non-existent. In such scenario, a good programmer could really optimize any program by writing it in assembly instead of in a high-level language: he was able to understand very well how the processor internally behaved and what the outcomes of each machine-level instruction were. Furthermore, he could get rid of all the "bloat" introduced by a compiler.

Things have changed a lot since then. Nowadays' processors are very complex devices: they have a very deep execution pipeline that, at a given time, can be executing dozens of instructions at once. They have powerful branch prediction units. They reorder instructions at run time and execute them in an out-of-order way (provided they respect the data dependencies among them). There are memory caches everywhere. So... it is, simply put, almost impossible for a programmer's brain to keep track of all these details and produce efficient code. (And even if he could, the efficiency could be so tied to a specific microprocessor version that it'd be useless in all other cases.)

Furthermore, compilers now have much better optimization stages than before and are able keep track of all these processor-specific details. For example, they can reorder instructions on their own or insert prefetching operations at key points to avoid cache misses. They can really do a much better job in converting code to assembly than a programmer would in most cases.

But hey! Of course it is still possible and useful to manually write optimized routines in assembly language — to make use of SIMD extensions for example — but these routines tend to be as short and as simple as possible.

So, summarizing: it no longer makes sense to write big programs (such as a complete operating systems) in assembly language. Doing that means you lose all the portability gains of a not-so-high-level language such as C and that you will probably do a worse optimization job than a compiler would. Plus well-written and optimized C code can be extremely efficient, as this language is just a very thin layer over assembly.

Oh, and back to the original quote. It would have made sense to mention the fact that the Mac Plus was written in assembly if it had been compared with another system of its epoch written in C. In that case, the argument would have been valid because the compilers were much worse than they are today and the processors were simpler. Just remember that such assertion is, in general, not true any more.

June 4, 2007 · Tags: assembly, c, processor
Continue reading (about 3 minutes)