My simplicity toolkit: Testing

Info

This post is part of my blog series about My Simplicity Toolkit. I highly suggest you read the introductory post before reading this one.

In my initial post on this blog post series I wrote about the verb “to complect”. It means to braid something together. In this article, I am giving examples of things related to testing that tend to be unnecessarily complected together.

Building up a suite of maintainable tests is all about trying to avoid complecting things unless truly necessary. Notice that complecting things is necessary at times, just not always.

Tests are naturally complected with the implementation it is trying to verify. If an implementation is broken, tests will, by design, (hopefully!) break. However, all developers have a choice of how large of an implementation surface a test should cover. Have a look at this figure:

A drawing comparing how tests should test one and only one component’s behaviour.

Depending on which level of coupling between tests and implementation a developer takes, you have vastly different levels of complexity. On the left side, changing an implementation will cause a lot of tests to break. Changing an implementation becomes slower and slower as you add more tests. They will all need to be updated! If you instead take the route of the right part of the figure, you will have a constant set of tests that will need to be updated over time. My previous articles “Testing strategy over time” & “On narrow integration tests” talks about this at length.

The following is a list of things I keep in my simplicity toolkit when it comes to testing:

Test outcome over implementation Link to heading

In my introduction above I spoke about the cost of updating tests every time I modify an implementation. It has a high cost.

To reduce the likelihood of that happening, tests should verify what (outcome) is being done in an implementation, not how it’s being done. In other words, your tests should not depend on the inner nitty-gritty details of an implementation. A different way of looking at it is that a rewrite of how a class or a function does something shouldn’t require changing any of the tests as long as the outcome is the same. This will make refactoring a lot easier.

Given/When/Then over unclarity what is being tested Link to heading

A common structure for automated tests is to split them up in three stages, “Given, When, Then” or “Arrange, Act, Assert” (two variants of the same thing). Here is an example:

def my_test():
  # Given two numbers:
  a = random_number()
  b = random_number()
  
  # When I call the sum function:
  c = sum([a, b])
  
  # Then the result should be the sum of the numbers
  assert c == a + b

Compare the above test to this:

def my_test():
  a = random_number()
  b = random_number()
  c = sum([a, b])
  assert c == a + b

The latter example makes it unclear what it is trying to test; Are we checking if the results return from random_number() can be summed? Or are we checking that the behavior of sum(...) behaves as we expect?

“I don’t dare to remove this test, because it might be testing something there isn’t a test for elsewhere.”

If we don’t clarify what we are testing (When or Act), developers don’t dare to remove tests over time. Suddenly we have a large evergrowing test suite - complecting our tests with implementation more and more. You end up having to update tons of tests every time you make a change to an implementation.

The above problem can be a huge headache in larger code bases.

Naming test by outcome over how Link to heading

Related to the above, it’s important to name your tests for the behavior they are testing, not how it is being done. This includes mentioning the expected outcome. For my example tests above, testThatTwoRandomNumbersSumUpCorrectly is a much better name that testThatSumLoopingWorks. That the sum(...) function uses a loop internally is irrelevant. And “works” says nothing of the expected outcome.

This article looks to be a good one when it comes to the naming of tests if you would like to know more.

Unit tests over integration tests Link to heading

An integration test is a test that, by definition, tests multiple units together. Compare that to a unit test which by definition only tests a single unit. This means, by definition, that an integration test complects with more parts of your implementation.

My article “Testing Strategy Over Time” talked about this at length.

This practice aligns very well with the classical Testing Pyramid.

Narrow integration tests over (general) integration tests Link to heading

…and if I write an integration test, I make it narrow. Narrow in this case means that to test an integrated set of units, I test how they integrate and work with each other pairwise. If they all work well with each other pairwise (narrow integration test), there is no need to test them altogether (wide integration test).

For example, I test that two layers in my application interact the way they should, instead of testing all layers together. This means, that I don’t need an actual database to test the HTTP controller layer. Instead, I make sure that the HTTP layer delegates calls properly to the business/service layer, etc.

My article “On narrow integration tests” talked about this at length.

Testing in production over end-to-end tests Link to heading

Warning

Controversial!

Tests are in a way a liability. There is a cost to maintaining them. The wider the integration tests, the higher the cost over time. The widest possible integration tests, and thus the most expensive to maintain, is an end-to-end test. It is a test that starts up your entire application, needing all your supporting infrastructure in place (databases, queues, caches…), to be able to run.

The most realistic test is the one when a user tests any functionality. Nothing can beat it. Users will double click on the buttons you’ve only clicked once, and they will enter 1000 characters in the text field you entered 10 characters.

While, having an automated test that makes sure you application can start can be useful, getting it out in production to test with actual users has the benefit of getting feedback quicker and not slowing down your development with complex, slow, and brittle tests.

Don’t get me wrong, though! To be able to test in production you need a good foundation:

Quick deploys.
A solid CI/CD system that can do a gradual rollout.
A feature flag system to toggle new features for a subset of users.
Solid observability in place to understand the success rate for user behaviours.

References Link to heading

YouTube - 🚀 TDD, Where Did It All Go Wrong (Ian Cooper)
Matt Schellhas’s Six Minute Guide to Useful Unit Tests

Test outcome over implementation Link to heading

Other test doubles over mocks Link to heading

Given/When/Then over unclarity what is being tested Link to heading

Naming test by outcome over how Link to heading

Unit tests over integration tests Link to heading

Narrow integration tests over (general) integration tests Link to heading

Testing in production over end-to-end tests Link to heading

References Link to heading