Good vs bad test helpers

9 min readOct 20, 2020

This time I’ll talk about the use and abuse of test helper methods. But first, let’s start (again) with a short description of why we are writing tests and what is a test for?

A test sends stimuli to the test subject and examines the response. In a more plain language, the test prepares some input data, then it executes a method on a class or a function in a module and finally compares the output values with the expected ones. It is the very basic goal of the test. You can reach it even with the spaghetti test code. Your computer will be able to interpret it and won’t complain.

Another high-level testing goal is to make the System Under Test (SUT) understandable to a test reader and maintainer (so-called Test As Documentation). They should be able to get the idea (without looking at the SUT code), what’s going on:

Why the outputs are as they are? How do they connect to the inputs? What does the SUT to transform them? What are the side effects of the SUT?
Which are the SUT’s external collaborators? Does the SUT ask them for data or tell them to do things on its behalf? What inputs do they receive? What do we do with the return values?

Rube Goldberg by rocor, licensed under CC BY-ND 2.0, captions added by me

Test helpers

If the logic to check is simple and the SUT is self-contained (no external collaborators), tests are very simple too. Think about the FizzBuzz kata — you have to try really hard to make FizzBuzz tests complicated.

When our behavior and data structures get more complex, tests become longer. We need more real estate to prepare the inputs and to check the outputs (read: longer and longer Arrange and Assert parts). Test methods start growing in size. We counter-react by extracting helper methods and/or by setting up some parts of the fixture implicitly. I’ll describe quickly three most popular mechanisms that apparently are dealing with test complexity.

Creation methods

The name is self-explanatory. They hide and encapsulate the gory details of the input creation, especially when the process is tedious:

Invoice createInvoiceWithLineItems() {
  var invoice = new Invoice();
  invoice.setId(12345); 
  invoice.setAmount(new BigDecimal(100)); 
  var customer = new Customer();
  customer.setFirstName("Marcin");
  customer.setLastName("Gryszko");
  invoice.setCustomer(customer);
  // ...
  return invoice;
}

Custom assertions

They try to hide the complexity of checking the result or of verifying test doubles:

void assertInvoice(Invoice invoice) {
  assertEquals(12345, invoice.getId());
  assertEquals(new BigDecimal(100), invoice.getAmount()); 
 
  verify(invoiceCreator.create(invoice))
}

You may refer to my article Painless Assertions for a thorough discussion about why they are problematic for value comparisons.

Creation Method and Custom Assertion are examples of Test Utility Methods. If you are interested in the detailed taxonomy of test helpers, please refer to the XUnit Test Patterns book.

Implicit Setup

If some parts of the test preparation are repeated between test cases, you may move them either to a method automatically executed before each test (annotated with @Before or @BeforeEach in the JUnit framework) or to a field initializer. It may use Creation Methods to further partition setup complexity.

Why implicit? Because the setup is not directly expressed in the test case and you are relying on the framework-fu to execute it.

Essential and accidental complexity in tests

Before I jump into the discussion of whether helper methods are harmful and how can they be useful, let’s stop for a moment and reflect on what is the essential and accidental complexity applied to tests. The concepts of essence and accident were brought to software development by Fred Brooks in his famous essay No Silver Bullet.

Fundamental properties that are absolutely necessary to say that a thing is a thing — this is the essence. If we add an additional property, permanently or temporarily, and the thing remains the thing — that is an accident. Narrowed down to complexity, the inherent difficulty of the problem is essential. Added difficulties, either by technology or by us are accidental.

Assuming that the System Under Test is a black box (I’m leaving out if it is clean and neat or it holds a spaghetti monster inside), the essential complexity of a test is the fundamental difficulty of instantiating and executing the SUT. If we remove an element (say, part of the input) we lose the ability to perform the test. The essence of our automated experiment disappears. Note also that accidental SUT complexity (bad design, hard to test code) can increase the essential test complexity.

Accidental complexity in automated tests is added mostly by us and less by tools (test execution frameworks, assertion, and mocking libraries). Any unnecessary burden introduced by us that makes it hard to understand how the production code works only by looking at the test and grasping it quickly, without heavy inside-of-your-head parsing and mental model generation — this is the accidental complexity. Anything we do that increases the cost of extending the tests (this will be done many times during the product lifecycle) — this is the accidental complexity too.

Other factors are contributing to the accidental test complexity, but let’s stop here and take a look at complexity from another perspective.

Essentiality and relevance

Other lenses to look through at the tests are the concepts of essentiality and relevance of Arrange and Assert test parts (I assume here that the Act part, the SUT execution, is essential and relevant by its nature). How can we classify the fixture setup and SUT verification elements?

Essential and relevant

Those parts are really required to execute the behavior of the SUT and demonstrate its functioning — how does it process the arguments and which results does it produce (directly or indirectly through side-effects). If you remove an element of the fixture the test will break, either through an error in the SUT execution or through an assertion. If you remove a part of the verification, the test will lose its sense of existence. A test without automated checks is not a test but another kind of main method executed by the framework.

Essential and irrelevant

Without those elements, it is impossible to execute the test. However, they are not needed to show to the test reader how the SUT works. Examples:

fill-in arguments required to create inputs: new Invoice(12345, new BigDecimal(100), anyCustomer)
infrastructure setup — e.g. obtaining a database connection and performing an initial cleanup

Non-essential and irrelevant

They are in the test by accident. These parts can be safely removed and the test still passes maintaining its essence. Probably you don’t work with the Minimal Fixture.

Good vs bad test helpers

After the digression about essence, accident, and relevance, let’s come back to the test utility methods.

What are our intentions when we extract them? Are we lowering complexity by introducing them? Or are we accidentally increasing complexity? Are we contributing to the “test as documentation” goal? Or are we making our tests obscure, hiding the SUT behind a wall of smoke and haze?

The essence of test helpers can be expressed in terms of their relationship to accidental test complexity:

Good test helpers remove accidental complexity. Bad test helpers add accidental complexity.
Good test helpers hide irrelevant details. Bad test helpers hide essential details.

Bad test utility methods hide connections between essential inputs and expected outputs. They bury the inputs behind layers of functions and hide them from the test reader. They provoke that we cannot draw the mental line between how the SUT is fuelled and the outcome of the execution. If the SUT shows the symptoms of complexity (i.e. the test becomes lengthy and repetitive), they sweep the essential SUT complexity under the rug giving the impression to the test maintainer that the world is beautiful and full of unicorns.

Good test utility methods hide the essential but irrelevant details, emphasizing what is essential and relevant to the SUT. They contain zero accidental and irrelevant details. They help the test maintainer to understand the association between the fixture and the result.

Let me illustrate the characteristics of good and bad test helpers with a couple of examples.

Example — bad test helpers

The following code fragment shows a test of the create invoice use case. The use case works together with two collaborators: authenticatedCustomerFinder (which knows how to get the authenticated customer) and invoiceRepository that persists the invoice.

The use case logic is fairly simple, but there is a lot going on in the test. What’s wrong with the code?

The fixture is partially created in the implicit setup (setup method delegating to createInvoiceToSave, lines 4–6), the test case itself (invoice id and amount passed to the use case), and in the mockCustomerFinder method (line 17). mockCustomerFinder is a Creation Method too — it creates the specification of the collaboration with one of the test doubles!
When looking at createInvoiceToSave we don’t know why we need to pass the specific invoice id and amount to the invoice constructor. Why is the customer required? Where it will come from?
createInvoiceToSave is not context-free. It changes its outer scope (test class field). We don’t know why we need to assign the invoice to a member variable and how this member variable is used afterward. Why are we not returning the newly created object?
We don’t show in the test method on line 12 how the direct inputs (id and amount will be used in the SUT). Without diving into the createInvoiceToSave method, we don’t know where and how those values are used.
We don’t know that the use case collaborates with the finder and the repository which is the protocol between them (what method does it call on the collaborators and which arguments it passes to them). Details are hidden in mockCustomerFinder and assertInvoiceSaved.
In assertInvoiceSaved, we don’t know where the invoice came from. Which properties does it have? Were they calculated by the SUT? Were they retrieved from another collaborator? Were they passed from outside?

The leitmotif of my complaints is about being unable to see the cause and effect between the inputs and the outputs. This smell is known as Mystery Guest and it’s a common cause of incomprehensible tests.

The refactored version inlines and removes the test helpers as they were making the test obscure. The test is much shorter and you can draw a connection between the inputs, the SUT, and the outputs.

Example — good test helpers

The first example shows a Creation Method that instantiates a part of the fixture — invoice object — essential to the test. It hides irrelevant details, like other arguments passed to the constructor (let’s hope that the rest of the constructor args will have some minimal, acceptable values). Used in a test, the method helps the reader to focus on what’s important: that we need an invoice and only its id and amount are required by the SUT.

Invoice invoiceWithIdAndAmount(int id, BigDecimal amount) {
  return new Invoice(id, amount, ... other irrelevant properties);
}

The second example shows a test of a cache wrapper that adds metrics on cache operations (I’m using Dropwizard Metrics API for this example). Before executing any method on the cache, we have to register all the metrics in the registry. This is an essential step (without that the cache wrapper won’t work) but irrelevant to understand how gets, puts, and hits are computed.registerMetrics hides those details.

We could improve the code even more by moving the metrics registration to the implicit setup part, focusing the test case even more on the metrics computation:

@BeforeEach
void setup() {
  registerMetrics();    
}

Summary

If you need to create a test utility method, make sure that it emphasizes what is relevant and hides what is irrelevant to understand the production code. Don’t increment the complexity with the test helper. And if you see that your test is growing in size, don’t rush to partition it. Maybe you are overlooking the precious feedback that you should consider redesigning the production code.

Good vs bad test helpers

Test helpers

Creation methods

Custom assertions

Implicit Setup

Essential and accidental complexity in tests

Essentiality and relevance

Essential and relevant

Essential and irrelevant

Non-essential and irrelevant

Good vs bad test helpers

Example — bad test helpers

Example — good test helpers

Summary

Written by Marcin Gryszko