Production software: Constants will bite you

Early on in your career as a developer, you are told to never put constant values in the code and instead define those separately using a self-descriptive name. For example, instead of doing this:

def modify_remote_file(...):
   contents = get_remote_file(server_url, file_name, timeout=60)
   ...
   put_remote_file(server_url, file_name, contents, timeout=60)

You would do this:

# Maximum amount of time a request on the file server can take.
FILE_SERVER_TIMEOUT = 60

def modify_remote_file(...):
   contents = get_remote_file(server_url, file_name,
                              timeout=FILE_SERVER_TIMEOUT)
   ...
   put_remote_file(server_url, file_name, contents,
                   timeout=FILE_SERVER_TIMEOUT)

This is obviously good advice so I’m not going to dispel it here: the hardcoded number in the first snippet was hard to keep track of, hard to keep consistent across calls and hard to tweak when necessary.

However, for software that runs in production far from the developer’s machine, you have to go one step further: any constant that affects the behavior of the software and that may need to be tweaked under any circumstances—no matter how unlikely—should be a tunable that can be redefined at runtime. The key here is not being able to change the value while the server is running; that does not matter much. The key is to be able to change the constant’s value without having access to the source code nor having to rebuild the binary.

But… why?

Because a constant value can turn into an outage and, when that happens, you want to be able to fix the immediate problem quickly without having to go to the source code at all. In our specific example above, suppose that code snippet was part of a server’s startup to log some data in a remote server: if the file server became slow or was suffering from latency issues, it’d be inconvenient for this other server to not (re)start up properly because it’s timing out on the file server!
Because a constant value may prevent you from fixing an outage quickly without having to rebuild the binary. This is an extension to the previous point. Suppose your server starts consuming too much CPU due to an inefficient algorithm that does not scale well with load and, to fix the current CPU shortcoming, you want to crank down the rate at which a certain background operation happens. If the rate at which such background operation runs is hardcoded in a constant, the only way you could do this is by rebuilding the binary.

In other words: what we really want to achieve is a way for the operator to be able to tweak the behavior of a server at will while attempting to fix an outage, without having to modify the binary at all.

Remember: rebuilding binaries and pushing them to production is not a quick task. It can be easy in some cases, but as with anything that touches production, you should be careful to not make the problem worse. Plus: who said the operator for the production service was the same person as the one that knows and is able to build new binaries in the correct manner? Oftentimes the operator won’t know how to do the latter — or he may just not have the permissions!

This article is part number 3 of 7 of the Production software series.

Production software: Constants will bite you

Featured software

Featured posts