Do comments belong with the unit tests instead?

I was going through a practice interview problem and I had returned to it because I wanted to experiment when it struck me that I’d left myself a comment that I never deleted.

Insert x in your data structure.

Delete one occurence of y from your data structure, if present.

Check if any integer is present whose frequency is exactly z. If yes, print 1 else 0.

This was in regards to an imaginary instruction set of which I had created a switch statement. In production code, I would have immediately seen this as a code smell. After all, if it was that difficult to remember, it should be an instant trigger that I should have used descriptive variables and refactored to method names with appropriate tests. With that in mind, I went through and started thinking about what the proper names would be, and if I could make it descriptive enough that I could come back 6 months later and know what was going on without having to look elsewhere.

So, I went through and started thinking about what the proper names would be if I could make it descriptive enough that I could come back 6 months later and know what was going on without having to look elsewhere. In this example, we can point to the Hackerrank specification. However, there is a train of thinking that argues documentation is an anti-pattern.¹ So, with that in mind, what would code without a textual specification look like?

Let’s look at the variables first. We could get simple and suggest insert, deleteOneOccurence, and queryFrequency, but queryFrequency is a poor name here. We are checking if any integer is present whose frequency matches the input. That is, if there are six 8s, 3 6 returns yes. isAnyIntegerPresentWhoseFrequencyExactlyX, though more descriptive, might be difficult to parse without context. While completely contrived as an example, we have all reached places where we racked our brains to think of just the right name.² Imagine trying to explain a trie if it had not already been named!

Test driven design advocates would laugh. “Why don’t you just check your executable spec - the well written test?” That’s a great idea! Now, in something like RSpec, I have the option of a well written specification like it "prints 1 if any integer is present z times in the data structure." and the converse "it "prints 0 if no integer is present z times in the data structure." That’s… more helpful. However, this is a novel data structure that is inherently confusing at first glance. What we might need here is “What is the functional reason we care about a set of frequencies in integers?” Our specification is documentation, but it may be incomplete documentation.

Great practitioners in great organizations might take care to follow domain-driven design, which would allow us to have shared meaning that would allow us to have shared, predefined, and more articulate terms for naming in our codebase. Even so, we will often run into cases where the meaning of the code is hard to express in short bursts, and we will find times where we can’t yet properly find concise abstractions. Our commenting practices cannot be determined by perfect practices, because we are not perfect programmers.

There is also the contrarian view by John Ousterhout that “in-code documentation plays a crucial role in software design” and that “inadequate documentation creates a huge and unnecessary drag on software development.” I am more sympathetic than most to this argument, as it shows that I started this with a comment. However, it simultaneously makes me consider that the test might be the more logical place to keep most comments, as that is where the specification itself lives! The test is providing a direct example as to why we care about the behavior of the source code, and seeing the intentional byproduct allows us to contextualize what may be confusing behavior. For example, I have seen code doing the obviously incorrect thing accompanied by the comment: // THIS IS WRONG but the downstream provider depends on this behavior. If we place the comment alongside the test, we will have an example of the correct-incorrect behavior next to one or more examples of this behavior.

However, I’d like to do better. We already have code coverage tools. Could we do even better and map the same concept to allow our tests to become a more explicit documentation source for our source? I’m not ready to commit to writing such a tool, but conceptually, I think it would enable a documentation approach like this, and might also encourage some of the practices that formed around behavior-driven development. This tool could also augment big picture documentation if done properly.

It would also encourage writing tests for confusing code, which might give programmers the courage to refactor it.

It’s worth an experiment.

Addendum 2023-01-05: It turns out that the D Language has something that would enable this very well. Their builtin unit test framework turns their unit test into examples and includes comments in their generated documentation, and their unit tests are connected to their code. This looks like a wonderful addition. Also, doctest in Elixir’s ExUnit looks like it is built with a similar mindset.

Let’s be fair: we cannot treat a tweet as a full argument, and I am sure there is nuance that wouldn’t be captured. However, the extended argument seems to be that teams should be using mob programming and practices such as the naming suggestions in Clean Code to make sure of a full and shared understanding such that further documentation is unnecessary. Although I am a proponent of mob programming, I do not think it fully absolves the need to write proper documentation, because we are humans who have blind spots. Code seems obvious at the time it is written; it will not necessarily be obvious years later when the original writers have left. Even though documentation can age poorly, having original intent is often as valuable as what the code is doing in the present. That’s for another blog. ↩
If you are interested in better advice about naming variables, Arlo Belshee wrote the best work I’ve seen on the subject. Belshee might argue that an extremely long name here would be appropriate until the team can find a better abstraction in general. ↩