Yesterday, I packaged evolution-webcal (which was a trivial task), but, as I expected, it didn't work. In fact, I realised that neither the contacts view nor the calendar view of Evolution 2.0 were working at all. I could see the components, but I couldn't interact with them. So I started to debug the problem.

In the console, there were several warnings printed out by Evolution that told it couldn't activate some bonobo component coming from Evolution Data Server. Uhm... ok; I searched through the code for that message, noted which function it was in and launched Evolution through gdb, adding a breakpoint in that function.

When the breakpoint was triggered, I saw strange things: the backtrace only contained two frames. Of these, a string parameter in frame 0 was null. "Oh, that must be the problem", I thought. Stupid me. Switched to frame 1 and saw that the string was, in fact, correct. "Ew, looks like something is going wrong in gdb". So I added several printf(3)'s in the code to check the pointer's value. All of them were correct; no null pointers anywhere.

My next thought was that gdb 5.3 (the version that comes with NetBSD) does not handle threads correctly, which made me install gdb 6.2.1 from pkgsrc. Hmm, this one has some nice features, like setting a breakpoint in a function that is still not loaded (which will be resolved when the shared library that contains it is opened). But it still showed me incorrect traces and parameters. Unfortunately, after many attempts, I had to forget about gdb. I don't know if I'm missing something or it doesn't support NetBSD threading correctly yet.

So I kept debugging with printf(3)'s, trying to understand why the call to bonobo_activation_activate_from_id (which I had already isolated) was returning an error. Well, better said, it returned no objects, because it was not handling errors at all (something that'd have saved me a lot of trouble).

Did some more tests but got bored quickly: rebuilding Evolution and running it from the source tree is not fun. Solution: create a small test case that calls the failing function with the same parameters. This took a bit of time because I had to do some research about the bonobo-activation and libbonobo APIs. But after all, I got it. And hopefully, it behaved as expected: it failed to load the components! Throwing OAFIID:GNOME_Evolution_DataServer_InterfaceCheck to the function made it fail, while picking a non-evolution-related component worked properly (I tried with OAFIID:Fontilus_Context_Menu_Factory). Ok, now, to look for differences between these two to see why one failed but not the other.

After several stupid tests, I added better error control to my test case and got a message that said something like: Cannot read from child process. Yay! This gave me the final clue.

Executed /usr/pkg/libexec/evolution-data-server-1.0 by hand and could see it dumping core. ktrace(1)'d it and saw it was calling a NetBSD 1.3 compatibility function. Huh? Tried with gdb, which this time was useful: I could see that the last call before the segfault was related to sigaction(2) and could get an useful call trace.

Inspected the code, tried to disable the signalling stuff and it ran properly! "Wow, I'm really close to the bug", I thought. Afterwards, I reenabled it, and while compiling the affected file, I could see a related warning that I'd never noticed before:

server.o(.text+0x109): In function `main':
evolution-data-server-1.0.2/src/server.c:129: warning: reference to compatibility sigemptyset(); include <signal.h> for correct reference

And here is the solution: added #include <signal.h> to the server.c file and everything worked properly. Evolution Data Served does not dump core any more and Evolution 2.0 works quite well. Isn't this one of the most stupid fixes you can think about?

I'm still surprised that the lack of a header file caused these kind of problems at run time. Guess I'll have to investigate a bit more why this happens. But not now; this took me more than two hours to discover and fix!