A coworker just sent me some Python code for review and, among such code, there was the addition of a function similar to:
def PathWithCurrentDate(prefix, now=None):
"""Extend a path with a year/month/day subdirectory layout.
Args:
prefix: string, The path to extend with the date subcomponents.
now: datetime.date, The date to use for the path; if None, use
the current date.
Returns:
string, The new computed path with the date appended.
"""
path = os.path.join(prefix, '%Y', '%m', '%d')
if now:
return now.strftime(path)
else:
return datetime.datetime.now().strftime(path)
The purpose of this function, as the docstring says, is to simplify the construction of a path that lays out files on disk depending on a given date.
This function works just fine… but it has a serious design problem (in
my opinion) that you only see when you try to write unit tests for such
function (guess what, the code to review did not include any unit tests
for this). If I ask you to write tests for PathWithCurrentDate
, how
would you do that? You would need to consider these cases (at the very
very least):
- Passing
now=None
correctly fetches the current date. To write such a test, we must stub out the call todatetime.datetime.now()
so that our test is deterministic. This is easy to do with helper libraries but does not count as trivial to me. - Could
datetime.datetime.now()
raise an exception? If so, test that the exception is correctly propagated to match the function contract. - Passing an actual date to
now
works. We know this is a different code path that does not calldatetime.datetime.now()
, but still we must stub it out to ensure that the test is not going through that past in case the current date actually matches the date hardcoded in the test as an argument tonow
.
My point is: why is such a trivial function so complex to validate? Why such a trivial function needs to depend on external state? Things become more obvious if we take a look at a caller of this function:
def BackupTree(source, destination):
path = PathWithCurrentDate(destination)
CreateArchive(source, os.path.join(path, 'archive.tar.gz'))
Now, question again: how do we test this? Our tests would look like:
def testOk(self):
# Why do we even have to do this?
... create stub for datetime.datetime.now() to return a fake date ...
CreateArchive('/foo', '/backups/prefix')
... validate that the archive was generated in the fake date directory ...
Having to stub out the call to datetime.datetime.now()
before calling
CreateArchive
is a really, really weird thing at first glance. To be
able to write this test, you must have deep insight of how the auxiliary
functions called within the function work to know what dependencies on
external state they have. Lots of black magic involved.
All this said, the above may not seem like a big issue because, well, a
call to datetime.datetime.now()
is cheap. But imagine that the call
being performed deep inside the dependency tree was more expensive and
dealt with some external state that is hard to mock out.
The trick to make this simpler and clearer is to apply a form of
Dependency
injection (or,
rather, “value injection”). We want the PathWithCurrentDate
function
to be a simple data manipulation routine that has no dependencies on
external state (i.e. make it purely functional). The easiest way to do
so is to remove the now=None
code path and pass the date in right from
the most external caller (aka, the main()
program). For example
(skipping docstrings for brevity):
def PathWithCurrentDate(prefix, now):
path = os.path.join(prefix, '%Y', '%m', '%d')
return now.strftime(path)
def BackupTree(source, destination, backup_date):
path = PathWithCurrentDate(destination, backup_date)
CreateArchive(source, os.path.join(path, 'archive.tar.gz'))
With this approach, the dependency on datetime.datetime.now()
(aka, a
dependency on global state) completely vanishes from the code. The code
paths to validate are less, and they are much simpler to test. There is
no need to stub out a function call seemingly unused by BackupTree
.
Another advantage of this approach can be seen if we were to have multiple functions accessing the same path. In this case, we would need to ensure that all calls receive the exact same date… what if the program kept running past 12AM and the “now” value changed? It is trivial to reason about this feature if the code does not have hidden queries to “now” (aka global state) within the code… but it becomes tricky to ensure our code is right if we can’t easily audit where the “now” value is queried from!
The “drawback”, as some will think, is that the caller of any of these functions must do more work on its own to provide the correct arguments to the called functions. “And if I always want the backup to be created on the current directory, why can’t the backup function decide on itself?”, they may argue. But, to me, the former is definitely not a drawback and the latter… is troublesome as explained in this post.
Another “drawback”, as some others would say, is that testing is not a goal. Indeed it is not: testing is only a means to “correct” code, but it is also true that having testable code often improves (internal) APIs and overall design.
To conclude: the above is an over-simplistic case study and my explanations will surely not convince anyone to stop doing black evil “clever” magic from within functions (and, worse, from within constructors). You will only realize that the above makes any sense when you start unit-testing your code. Start today! :-)