Showing 13 posts
When reviewing an incoming C++ PR last week, I left a comment along the lines: “merge local variable declaration with its initialization”. But why? Is this just a stylistic issue or is there something deeper to warrant making the change? Let’s look at stack frames, C, and then C++ to answer these questions.
July 12, 2021
·
Tags:
c, c++, readability
Continue reading (about
11 minutes)
Today I would like to dive into the topic of unused parameters in C and C++: why they may happen and how to properly deal with them—because smart compilers will warn you about their presence should you enable -Wunused-parameter
or -Wextra
, and even error out if you are brave enough to use -Werror
.
You would think that unused parameters should never exist: if the parameter is not necessary as an input, it should not be there in the first place! That’s a pretty good argument, but it does not hold when polymorphism enters the picture: if you want to have different implementations of a single API, such API will have to provide, on input, a superset of all the data required by all the possible implementations.
February 16, 2015
·
Tags:
c, cxx
Continue reading (about
6 minutes)
Update (2014-12-19): The advice provided in this blog post is questionable and, in fact, probably incorrect. The bug described below must have happened for some unrelated reason (like, maybe, reuse of ap), but at this point (three years later!) I do not really remember what was going on here nor have much interest in retrying.
A long time ago, while I was preparing an ATF release, I faced many failing tests and crashes in one of the platforms under test. My memory told me this was a problem in OpenSolaris, but the repository logs say that the problem really happened in Fedora 8 x86_64.voidThe codebase of ATF provides _fmt and _ap variants for many functions to give more flexibility to the caller and, as shown above, the _fmt variant just relies on the _ap variant to do the real work.
foo_fmt(const char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
foo_ap(fmt, ap);
va_end(ap);
}
void
foo_ap(const char *fmt, va_list ap)
{
char buf[128];
vsnprintf(buf, sizeof(buf), fmt, ap);
... now, do something with buf ...
}
voidThis duplication of the ap argument pointing to the variable list of arguments ensures that ap2 can be safely used from the new stack frame.
foo_ap(const char *fmt, va_list ap)
{
char buf[128];
va_list ap2;
va_copy(ap2, ap);
vsnprintf(buf, sizeof(buf), fmt, ap2);
va_end(ap2);
... now, do something with buf ...
}
September 12, 2011
·
Tags:
c, portability
Continue reading (about
3 minutes)
In C, particularly due to the lack of dynamic strings, it's common to pass format strings around together with a variable set of arguments. A prototype like this is very common:
void my_printf(const char*, ...);
For the standard printf and similar functions, some compilers will ensure that the variable list of arguments matches the positional parameters in the format string and, if they don't match, raise a warning. This is, however, just a warning "hardcoded" to match these functions, as the compiler can't know how the variable arguments of our custom my_printf function relate to the first argument.
Or can it?
I was made aware of a nice GCC attribute that allows developers to tag printf-like functions in a manner that allows the compiler to perform the same validation of variable arguments and format strings. This is in the form of a GCC __attribute__ that also happens to work with CLang. Let's see an example to illustrate how this works:
#include <stdarg.h>
#include <stdio.h>
static void my_printf(const char*, ...)
__attribute__((format(printf, 1, 2)));
static void
my_printf(const char* format, ...)
{
va_list ap;
printf("Custom printf: ");
va_start(ap, format);
vprintf(format, ap);
va_end(ap);
}
int
main(void)
{
my_printf("this is valid %dn", 3);
my_printf("but this is not %fn", 3);
}
$ clang example.c
example.c:22:33: warning: conversion specifies type 'double' but
the argument has type 'int' [-Wformat]
my_printf("but this is not %fn", 3);
~^ ~
1 warning generated.
June 17, 2011
·
Tags:
c
Continue reading (about
2 minutes)
In C — or, for that matter, several other languages such as Python or C++ — most native types can be coerced to a boolean type: expressions that deliver integers, pointers or characters are automatically treated as boolean values whenever needed. For example: non-zero integer expression and non-NULL pointers evaluate to true whereas zero or NULL evaluate to false.
Many programmers take advantage of this fact by stating their conditionals like this:
void func(const int in) {
if (in) {
... do something when in != 0 ...
} else {
... do something else when in == 0 ...
}
}
bool func(const struct mystruct *ptr) {
int out = calculate_out(in);
... do something more with out ...
return out; // The return type is bool though; is this ok?
}
April 30, 2011
·
Tags:
c
Continue reading (about
2 minutes)
Some of the methods of the Lua C API can raise errors. To get an initial idea on what these are, take a look at the Functions and Types section and pay attention to the third field of a function description (the one denoted by 'x' in the introduction).
my_array = nil... which is obvious because indexing a non-table object is a mistake. Now let's consider how this code would look like in C (modulo the my_array assignment):
return my_array["test"]
lua_getglobal(state, "my_array");Simple, huh? Sure, but as it turns out, any of the API calls (not just lua_gettable) in this code can raise errors (I'll call them unsafe functions). What this means is that, unless you run the code with a lua_pcall wrapper, your program will simply exit in the face of a Lua error. Uh, your scripting language can "crash" your host program out of your control? Not nice.
lua_pushstring(state, "test");
lua_gettable(state, -2);
January 7, 2011
·
Tags:
c, cxx, lua
Continue reading (about
6 minutes)
For a long time, I have been aware of the existence of the standard C functions setjmp and longjmp and that they can be used to simulate exceptions in C code. However, it wasn't until yesterday that I had to use them... and it was not trivial. The documentation for these functions tends to be confusing, and understanding them required looking for additional documents and a bit of experimentation. Let's see if this post helps in clarifying how these functions work.
The first call to setjmp causes the process state (stack, CPU registers, etc.) to be saved in the provided jmp_buf structure and, then, a value of 0 to be returned. A subsequent call to longjmp with the same jmp_buf structure causes the process to go "back in time" to the state stored in said structure. The way this is useful is that, when going back in time, we tweak the return value of the setjmp call so we can actually run a second (or third or more) path as if nothing had happened.
Let's see an example:
#include <setjmp.h>The example above shows the following when executed:
#include <stdio.h>
#include <stdlib.h>
static jmp_buf buf;
static void
myfunc(void)
{
printf("In the function.n");
... do some complex stuff ...
/* Go back in time: restore the execution context of setjmp
* but make the call return 1 instead of 0. */
longjmp(buf, 1);
printf("Not reached.n");
}
int
main(void) {
if (setjmp(buf) == 0) {
/* Try block. */
printf("Trying some function that may throw.n");
myfunc();
printf("Not reached.n");
} else {
/* Catch block. */
printf("Exception caught.n");
}
return EXIT_SUCCESS;
}
Trying some function that may throw.So, what happened above? The code starts by calling setjmp to record the execution state and the call returns 0, which causes the first part of the conditional to run. You can think of this clause as the "try" part of an exception-based code. At some point during the execution of myfunc, an error is detected and is "thrown" by a call to longjmp and a value of 1. This causes the process to go back to the execution of setjmp but this time the call returns 1, which causes the second part of the conditional to run. You can think of this second clause as the "catch" part of an exception-based code.
In the function.
Exception caught.
January 2, 2011
·
Tags:
c
Continue reading (about
3 minutes)
For the last couple of days, I have been playing around with the Lua C API and have been writing a thin wrapper library for C++. The main purpose of this auxiliary library is to ensure that global interpreter resources such as the global state or the execution stack are kept consistent in the presence of exceptions — and, in particular, that none of these are leaked due to programming mistakes when handling error codes.
To illustrate this point, let's forget about Lua and consider a simpler case. Suppose we lost the ability to pass arguments and return values from functions in C++ and all we have is a stack that we pass around. With this in mind, we could implement a multiply function as follows:
void multiply(std::stack< int >& context) {And we could call our function as this:
const int arg1 = context.top();
context.pop();
const int arg2 = context.top();
context.pop();
context.push(arg1 * arg2);
}
std::stack< int > context;In fact, my friends, this is more-or-less what your C/C++ compiler is internally doing when converting code to assembly language. The way the stack is organized to perform calls is known as the calling conventions of an ABI (language/platform combination).
context.push(5);
context.push(6);
multiply(context);
const int result = s.top();
s.pop();
void magic(std::stack< int >& context) {The above is a completely fictitious and useless function, but serves to illustrate the point. magic() starts by pushing two values on the stack and then performs some computation that reads these two values. It later pushes an additional value and does some more computations on the three temporary values that are on the top of the stack.
const int arg1 = context.top();
context.pop();
const int arg2 = context.top();
context.pop();
context.push(arg1 * arg2);
context.push(arg1 / arg2);
try {
... do something with the two values on top ...
context.push(arg1 - arg2);
try {
... do something with the three values on top ...
} catch (...) {
context.pop(); // arg1 - arg2
throw;
}
context.pop();
} catch (...) {
context.pop(); // arg1 / arg2
context.pop(); // arg1 * arg2
throw;
}
context.pop();
context.pop();
}
class temp_stack {With this, we can rewrite our function as:
std::stack< int >& _stack;
int _pop_count;
public:
temp_stack(std::stack< int >& stack_) :
_stack(stack_), _pop_count(0) {}
~temp_stack(void)
{
while (_pop_count-- > 0)
_stack.pop();
}
void push(int i)
{
_stack.push(i);
_pop_count++;
}
};
void magic(std::stack< int >& context) {Simple, huh? Our temp_stack function keeps track of how many elements have been pushed on the stack. Whenever the function terminates, be it due to reaching the end of the body or due to an exception thrown anywhere, the temp_stack destructor will remove all elements previously registered from the stack. This ensures that the function leaves the global state (the stack) as it was on entry — modulo the function parameters consumed as part of the calling conventions.
const int arg1 = context.top();
context.pop();
const int arg2 = context.top();
context.pop();
temp_stack temp(context);
temp_stack.push(arg1 * arg2);
temp_stack.push(arg1 / arg2);
... do something with the two values on top ...
temp_stack.push(arg1 - arg2);
... do something with the three values on top ...
// Yes, we can return now. No need to do manual pop()s!
}
December 27, 2010
·
Tags:
c, cxx, kyua, lua
Continue reading (about
6 minutes)
Let's face it: spawning child processes in Unix is a "mess". Yes, the interfaces involved (fork, wait, pipe) are really elegant and easy to understand, but every single time you need to spawn a new child process to, later on, execute a random command, you have to write quite a bunch of error-prone code to cope with it. If you have ever used any other programming language with higher-level abstraction layers — just check Python's subprocess.Popen — you surely understand what I mean.
static
atf_error_t
run_ls(const void *v)
{
system("/bin/ls");
return atf_no_error();
}
static
void
some_function(...)
{
atf_process_stream_t outsb, errsb;
atf_process_child_t child;
atf_process_status_t status;
atf_process_status_init_redirect_path(&outsb, "ls.out");
atf_process_status_init_redirect_path(&errsb, "ls.err");
atf_process_fork(&child, run_ls, &outsb, &errsb, NULL);
... yeah, here comes the concurrency! ...
atf_process_child_wait(&child, &status);
if (atf_process_status_exited(&status))
printf("Exit: %dn", atf_process_status_exitstatus(&status));
else
printf("Error!");
}
June 21, 2009
·
Tags:
atf, boost-process, c
Continue reading (about
3 minutes)
For a long time, ATF has shipped with build-time tests for its own header files to ensure that these files are self-contained and can be included from other sources without having to manually pull in obscure dependencies. However, the way I wrote these tests was a hack since the first day: I use automake to generate a temporary library that builds small source files, each one including one of the public header files. This approach works but has two drawbacks. First, if you do not have the source tree, you cannot reproduce these tests -- and one of ATF's major features is the ability to install tests and reproduce them even if you install from binaries, remember? And second, it's not reusable: I now find myself needing to do this exact same thing in another project... what if I could just use ATF for it?
Even if the above were not an issue, build-time checks are a nice thing to have in virtually every project that installs libraries. You need to make sure that the installed library is linkable to new source code and, currently, there is no easy way to do this. As a matter of fact, the NetBSD tree has such tests and they haven't been migrated to ATF for a reason.
I'm trying to implement this in ATF at the moment. However, running the compiler in a transparent way is a tricky thing. Which compiler do you execute? Which flags do you need to pass? How do you provide a portable-enough interface for the callers?
The approach I have in mind involves caching the same compiler and flags used to build ATF itself and using those as defaults anywhere ATF needs to run the compiler itself. Then, make ATF provide some helper check functions that call the compiler for specific purposes and hide all the required logic inside them. That should work, I expect. Any better ideas?
March 5, 2009
·
Tags:
atf, c, cxx
Continue reading (about
2 minutes)
One of the things I miss a lot when writing the C-only code bits of ATF is an easy way to raise and handle errors. In C++, the normal control flow of the execution is not disturbed by error handling because any part of the code is free to notify error conditions by means of exceptions. Unfortunately, C has no such mechanism, so errors must be handled explicitly.
At the very beginning I just made functions return integers indicating error codes and reusing the standard error codes of the C library. However, that turned out to be too simple for my needs and, depending on the return value of a function (not an integer), was not easily applicable.
What I ended up doing was defining a new type, atf_error_t, which must be returned by all functions that can raise errors. This type is a pointer to a memory region that can vary in contents (and size) depending on the error raised by the code. For example, if the error comes from libc, I mux the original error code and an informative message into the error type so that the original, non-mangled information is available to the caller; or, if the error is caused by the user's misuse of the application, I simply return a string that contains the reason for the failure. The error structure contains a type field that the receiver can query to know which specific information is available and, based on that, cast down the structure to the specific type that contains detailed information. Yes, this is very similar to how you work with exceptions.
In the case of no errors, a null pointer is returned. This way checking for an error condition is just a simple pointer check, which is no more expensive than an integer check. However, handling error conditions is more costly, but given that these are rare, it is certainly not a problem.
What I don't like too much of this approach is that any other return value must be returned as an output parameter, which makes things a bit confusing. Furthermore, robust code ends up cluttered with error checks all around given that virtually any call to the library can produce an error somewhere. This, together with the lack of RAII modeling, complicates error handling a lot. But I can't think of any other way that could be simpler but, at the same time, as flexible as this one. Ideas? :P
More details are available in the atf-c/error.h and atf-c/error.c files.
February 24, 2008
·
Tags:
atf, c
Continue reading (about
2 minutes)
I have spent part of past week and this whole weekend working on a C-only library for ATF test programs. An extremely exhausting task. However, I wanted to do it because there is reluctancy in NetBSD to write test programs in C++, which is understandable, and delaying it more would have made things worse in the future. I found this situation myself some days ago when writing tests for very low level stuff; using C++ there felt clunky, but it was still possible of course.
I have had to reimplement lots of stuff that are given for-free in any other, higher-level (not necessarily high-level) language. This includes, for example, a "class" to deal with dynamic strings, another one for dynamic linked lists and iterators, a way to propagate errors until the point where they can be managed... and I have spent quite a bit of time debugging crashes due to memory management bugs, something that I rarely encountered in the C++ version.
However, the new interface is, I believe, quite neat. This is not because of the language per se, but because the C++ interface has grown "incorrectly". It was the first code in the project and it shows. The C version has been written from the ground up with all the requirements known beforehand, so it is cleaner. This will surely help in cleaning up the C++ version later on, which cannot die anyway.
The code for this interface is in a new branch, org.NetBSD.atf.src.c, and will hopefully make it to ATF 0.5: it still lacks a lot of features, hence why it is not on mainline. Ah, the joys of a distributed VCS: I have been able to develop this experiment locally and privately until it was decent enough to be published, and now it is online with all history available!
From now on C++ use will be restricted to the ATF tools inside ATF itself, and to those users who want to use it in their projects. Test cases will be written using the C library except for those that unit-test C++ code.
February 18, 2008
·
Tags:
atf, c
Continue reading (about
2 minutes)
I was reading an article the other day and found an assertion that bugged me. It reads:
System 6.0.8 is not only a lot more compact since it has far fewer (mostly useless) features and therefore less code to process, but also because it was written in assembly code instead of the higher level language C. The lower the level of the code language, the less processing cycles are required to get something done.It is not the first time I see someone claiming that writing programs in assembly by hand makes them faster, and I'm sure it is not the last time I'll see this. This assertion is, simply put, wrong.
June 4, 2007
·
Tags:
assembly, c, processor
Continue reading (about
3 minutes)