Yesterday, for some unknown reason, I ended up running id(1) with two different user names as its arguments. Mysteriously, I only got the details for the first user back and no error for the second one. After looking at the manual page and what the GNU implementation did, I realized that the command is only supposed to take a single user or none at all as part of its arguments.
OK, so "let's add a simple argc check to the code and raise the appropriate error when it is greater than 2". Yeah, right. If you look at id(1)'s main routine, you'll find an undecipherable piece of spaghetti code — have you ever thought about adding multiple ?flag variables and checking the result of the sum? — that comes from the fact that id(1)'s code is shared across three different programs: id(1), groups(1) and whoami(1).
After spending some time trying to understand the rationale behind the code, I concluded that I could not safely fix the problem as easily as I first thought. And, most likely, touching the logic in there would most likely result in a regression somewhere else, basically because id(1) has multiple primary, mutually-exclusive options and groups(1) and whoami(1) are supposed to have their own syntax. Same unsafety as for refactoring it.
So what did I do? Thanks to ATF being already in NetBSD, I spent the day writing tests for all possible usages of the three commands (which was not trivial at all) and, of course, added stronger tests to ensure that the documented command line syntax was enforced by the programs. After that, I was fairly confident that if I changed the code and all the new tests passed afterwards (specially those that did before), I had not broken it. So I did the change only after the tests were done.
I know it will be hard to "impose" such testing/bug-fixing procedure to other developers, but I would really like them to consider extensive testing... even for obvious changes or for trivial tools such as these ones. You never know when you break something until someone else complains later.