Our Blog

Ongoing observations by End Point people

Tests are not Specs

By Steven Jenkins · Friday, September 4, 2009

Tags: rails, testing

We're big fans of Test Driven Development (TDD). However, a co-worker and I encountered some obstacles because we focused too intently on writing tests and didn't spend enough up-front time on good, old-fashioned specifications.

We initially discussed the new system (which is a publish/subscribe interface used to do event management for a reasonably large system, which totals around 70K lines of Ruby). My co-worker did most of the design and put a high-level one-pager together to outline how things should work, wrote unit tests and a skeleton set of classes and modules, then handed the project to me to implement.

So far, so good. All I had to do was make all of the tests pass, and we were finished.

We only had unit tests, no integration tests, so there was no guarantee that once I was done coding, that the integration work would actually solve the problem at hand. In Testing (i.e., the academic discipline that studies testing), this is referred to as a validation problem: we may have a repeatable, accurate measure, but it's measuring the wrong thing.

We knew that was a weakness, but we pressed ahead anyway, expecting to tackle that later. As an example, we identified 3 different uses of this publish/subscribe event management mechanism that had wildly different use cases. When we discussed these with the customer, he clarified that one of the use cases is needed in the immediate term, one is useful in the short term, and that the third is out of scope. Getting that information was helpful in keeping us on track, and not having the scope grow unmanageably.

Tests are code and no code (of sufficient size and complexity) is bug-free; thus, tests have bugs.

When tests are the only spec, what is the best way to proceed?

The developer can assume tests are correct and Make The Tests Pass; clearly that is not always the best approach. It's better for the developer to exercise judgement and fix obvious errors. However, the developer's judgement can be wrong, so the test writer needs to pay special attention to any changes to the tests (and the need to catch problems implies that the test designer and developer need to be in tight communication -- don't just hand your co-worker your tests and then go on a long vacation).

Sometimes tests aren't buggy, per se, but they may be ambiguous. Variable names may not communicate clearly. They may be too large and not clearly test one thing. They may be too broad and leave unspecified the intent or design parameters in mind.

In our recent experience, one test had the following code (slightly altered):

 it 'should pass event to each callback in sequence' do
  listener = mock 'listener' callback_seq = sequence 'callbacks'


What's wrong with this? On the surface, nothing is wrong, until the bigger picture is viewed: there is no other mention of a listener anywhere else in the tests, high-level design document, or code. Is a listener a subscriber? Should there be a separate listener class somewhere? After all, mock 'foo' often means that there should be a foo object. Perhaps the test developer forgot to include a file (or the developer overlooked it).

What actually transpired is not so mundane, but it identified a very different approach in testing. My colleague made the observation that it doesn't matter if a listener is a subscriber or not for this particular test, as it's really only a syntactic placeholder: we could do a variable renaming for listener and that should not change the meaning of the code.

While his observation is true and correct, it ignores Abelson and Sussman's viewpoint that "programs must be written for other humans to read, and only incidentally for machines to execute." As the implementer behind the pseudo-Chinese-wall of his tests, I expected the tests to tell me how the universe of this system should be constructed, and the mention of a listener communicated something other than the intended message.

Sometimes, even unit tests require extensive setup, and it can be tempting to add in extra tests and checks which don't add a lot of value but instead make the intent unclear, make the tests themselves less DRY, and give yet another opportunity to introduce bugs. One example looked something like:

describe 'creating subscriber entry' do
  before do
    @subscriber = stub 'subscriber'
  describe 'with a method name' do
    it 'should create a block that invokes the method name' do
      class << @subscriber                     
        attr_accessor :weakref, :last_received                      

        def callback(e)                         
          self.last_received = e
          self.weakref = self.respond_to?(:weakref_alive?)
      entry = @publisher.class.create_subscriber_entry(@subscriber, :callback)
      entry.size.should == 2
      event = stub 'event'
      @subscriber.last_received.should == event
      @subscriber.weakref.should be_true

Note that the purpose of the test is Creating a subscriber entry with a method name should create a block that invokes the method name, yet the test checks the size of the subscriber entry, verifies that the event is received, and that the callback itself is stored via a weak reference so that it can be garbage collected. Each of those should be in separate tests. In fact, the stated goal of the test is only implicitly checked. Better is something like

  class << @subscriber
    attr_accessor :sentinel
    def callback(e)
      self.sentinel = true
  entry = @publisher.class.create_subscriber_entry(@subscriber, :callback)
  event = stub 'event'
  @subscriber.sentinel.should be_nil
  @subscriber.sentinel.should be_true

which only tests that the named callback is invoked, properly setting a sentinel value.

If tests are being used as specification, then they will hide important details. A simple example if the type of storage to use for a particular set of values (for us, it was callbacks). Should they be in an array? a hash? a hash of arrays? something else?

How to handle this is a little more tricky -- implementation details like this could arguably not be part of a set of tests, as the behavior is the driver, not implementation details. However, without a specification, or a design document that outlines what kinds of performance characteristics we should aim for here, the implementer has to make choices, and those choices are not necessarily what the test writer would have wanted.

There is no one right answer there: for those who want to only use tests, then the tests need to be complete and cover the implementation details. Of course, this means that if a future scaling problem requires a change in data structures, then the test will also need to be ported to the new architecture. If specifications or design documents are used, that can speed the implementer's work, but leaves open some questions of correctness (e.g., did the implementer use the right architecture in the right way).

We solved these problems (and more) in true Agile fashion: through good communication among the customer,the test designer, and the developer, but this experience reinforced to us that tests alone are insufficient, and good communication needs to be maintained in the development process.

Real specs can help, too.