Effective test doubles, part 2

13 min readNov 28, 2020

This is the second (and the last) part of my longer article about applying effectively test doubles. In part 1 I presented tips on tactical mock/stub usage. I assumed that you were already using mock objects, I gave hints about how to squeeze the most out of them and avoid increasing test complexity. In this part, I’ll talk about when and how to introduce mock objects in your tests. The following advice doesn’t require you to follow test-driven development; they apply even if you write the tests after implementing the production code (although I recommend you strongly the former approach).

Introduce test doubles for boundaries to keep outside of your tests

There are two schools of TDD — classist (Chicago) and mockist (London). You can find lots of readings about them on the Internet. Here I’ll describe what I consider the essence of both TDD currents (applicable even if you don’t test first!) — their approach to the usage of mocks.

The mockist way — mocks as a design tool

You think hard about the design of the unit you are about to test. Which responsibilities should belong to the System Under Test (SUT)? Which responsibilities do you want to delegate to collaborators? Then you make an assumption (or an educated guess) about the design and you define contracts with SUT neighbors using mocks and stubs. If you are following TDD, this is done in the red phase. You iterate on the collaboration until you are satisfied.

In the next step, you implement the provider part of the contract (what the test double replaced on the client-side). The mock/stub specifications on the client part give you examples of how the consumer wants to interact with the provider. You use the invocation examples (either being literally equal or belonging to the same class of inputs) to drive the provider tests.

Your repeat the process until you hit the infrastructure (where you should be using neither mocks nor stubs)

In the example below, you start from the client (the use case), define what is required from the repository using a mock, and finally you implement the concrete, MySQL-backed repository which you test in isolation:

class CreateInvoiceUseCaseTest {
  val invoiceRepository = mockk<InvoiceRepository>()
  val useCase = CreateInvoiceUseCase(invoiceRepository)
  
  @Test fun create() {
    // contract definition, client side
    every { invoiceRepository.save(anInvoice) } just Runs
 
    useCase.execute(anInvoice)
 
    verify { invoiceRepository.save(anInvoice) }
  }
}// contract verification, provider part
class MysqlInvoiceRepositoryTest {
  val repository = MysqlInvoiceRepository()
  
  @Test fun save() {
    val invoice = Invoice(...)    // execute and verify contract provider
    repository.save(invoice)    // check in the DB that the invoice was saved
  }
}

What is a contract? According to Wikipedia:

A contract is a legally binding document between at least two parties that defines and governs the rights and duties of the parties to an agreement.

My own, down-to-Earth definition is:

What the party expects from me and what I expect from the party.

The contract has two parts:
- visible (syntactic) part — defines what is the type of the method/function to invoke. In the case of typed languages, this is the method signature.
- hidden (semantic) part — describes what is the expected observable effect of executing the contract (i.e. invoking the method/function) and how computations referenced by the contract interrelate. For example, if I clear a mutable collection, then I can expect the size to be 0. If I remove an element from an immutable collection, then I can expect to get a collection copy without that element.

The classist way — mocks as test complexity reducer

If you keep adding production code (either following test-first or test-last approaches) to the unit under test and you cover new functionalities with tests, you notice quickly that:

there are duplications in the test code because you repeat the same setup over and over again to reach some specific SUT logic
you don’t know what is actually going on in the test and the production code. You lose track of how inputs get transformed into outputs and what is the role of the SUT in this process. Your test is checking multiple unrelated behaviors and code paths — an antipattern known as integrated test.

To fix that (if not already done) you start redesigning (or refactoring) the SUT. You spawn new collaborators shifting to them parts of the original SUT responsibilities. You cover them with isolated tests. Finally, you introduce mocks in the original test to replace the newly created collaborators. If you are following TDD, this is done when your tests are green (and they should stay green while refactoring).

icon by Basti Steinhauer from Noun Project

Take this initial use case test. It repeats some setup steps because we want to exhaustively check all the cases of the complex VAT calculation logic together with the rest of the invoice creation logic:

class CreateInvoiceUseCaseTest {
  @Test fun `standard VAT rate`() {
    val items = listOf(
      Item(sku = standardVatRateSku, amount = 102) 
    )
 
    // check if the right VAT has been calculated 
    assertEquals(Invoice(totalVat = 123), useCase.execute(items))
  }
  
  // test for reduced VAT rate
 
  // test for zero VAT rate
  
  // test for multiple items with mixed VAT rate classes
  
  // ...
}

After the refactoring, the VAT calculation is placed in a separate component. VAT calculator is replaced by the stub in the use case test. A new, isolated test is created for the VAT calculation logic:

class CreateInvoiceUseCaseTest {
  val vatCalculator = mockk<VatCalculator>()
 
  @Test fun create() {
    every { 
      vatCalculator.calculateFor(TaxableItem(sku, netAmount), ...) 
    } returns aVatAmount    val items = listOf(Item(sku = sku, netAmount = netAmout), ...)
 
    // check if the VAT from the calculator is inside the invoice
    assertEquals(Invoice(totalVat = aVatAmount),
      useCase.execute(items))
  }
}class CalculateVatTest {
  val vatCalculator = ThreeCateogoriesVatCalculator()
 
  @Test fun `standard VAT rate`() {
    val items = listOf(
      TaxableItem(sku = standardVatSku, netAmount = 123)
    )
 
    // check if the right VAT has been calculated 
    assertEquals(Vat(breakdownByItem, totalAmount),  
      vatCalculator.calculate(items))
  }
  
  // test for reduced VAT rate
  // test for zero VAT rate
  // test for multiple items with mixed VAT rate classes
}

Which approach is the best?

Both and neither of them. They are just tools to keep them in your coder’s toolbelt. The nice thing about both approaches is that you can switch deliberately or instinctively between them when working on a feature. You can start in the middle, throw in some code to get an insight into the solution and introduce some mocks when the complexity starts overwhelming you. Then you work your way up and down (until you hit the infrastructure). Having more clarity about the design, you inject mocks before the test feedback obliges you to do so. Or you can start from the top (whatever the top means for you — it can be the user stimulus or an entry point deeper in your code) and apply mocks to fiddle with your design.

Don’t suppress design feedback with mocks

Don’t choose hard-to-test code as your boundary to be replaced with a double. If a module is problematic to instantiate or is tightly coupled to the outside world (imagine a testing nightmare — a considerable number of hard-coded collaborators talking directly to the infrastructure), don’t swap it with mocks.

As for every rule, there are exceptions. When you are working with legacy code and you are trying desperately to cover it with tests, no holds barred — mock whatever you need to be able to execute and verify the legacy system.

Mock behavior, not values

In the previous section, I presented two ways of introducing mocks — as a design or refactoring tool. But what exactly should be mocked?

Look at the SUT and its collaborators through the functional lens: there are behaviors (or computations) and values (or state). Behaviors are abstract transformations (or mappings) between types (a DDD service is an example of a behavior). Values are concrete members of a type. Concrete behaviors — functions — are values too, but here I’m focusing on abstract behaviors where you know what are the source and target types but you don’t know exactly how they are mapped.

When choosing what to mock, replace computations, not values, with test doubles. To give the rationale behind this advice, let’s look again through two perspectives — the mockist and the classist.

The mockist way — mocks as a design tool

Mocks define a contract between the SUT (client) and a provider (which needs to be implemented separately). The contract is kind of a requirement specification — we express what we need from a collaborator but we don’t indicate how the agreement has to be fulfilled. The contract makes sense only where there is an abstract behavior. In the case of values, there is nothing to specify — the value itself conditions its usage. For example, if you have a record or product type, you know what are its members, and you access (or pattern match) them directly — no abstraction here.

If you work with a class-oriented language, you may ask if you need to introduce interfaces to replace them with doubles. The answer is no — both concrete classes and interfaces define a function type. Compare:

class Repository {
  fun save(invoice: Invoice): Unit
}

interface Repository {
  fun save(invoice: Invoice): Unit
}

save function has the same type — Invoice -> () independently of the language construct used to represent the type.

The classist way — mocks as test complexity reducer

You add mocks to remove test smells (duplicated and obscure tests). With current mocking libraries, there are no hurdles to introduce test doubles for values. However, when you replace a class which is a state bearer with a stub, you actually add complexity to the test instead of removing it. As I wrote in the previous section, there is a new contract to be checked on the provider side. This boils down to write tests for field accessors (or getters). This is a mindless activity that doesn’t provide any value. Instead, create the value with its constructor and feed the SUT with it.

Hard-to-create values

Values should be simple. But what if they are simple but their creation is not easy? If the instantiation is hard, don’t suppress this feedback with a mock and resolve the root cause. Is there a logic in the constructor that should be anywhere else? Are collaborators required to construct a value? Maybe there is a new behavior or concept peeking out and asking you to be moved outside of the constructor.

Blended behavior and state

What to do when your SUT works together with a class that mixes values and computations? This is the essence of object orientation, isn’t it? If the attached behavior calculates derived properties, you can treat them as values, and test them indirectly together with the SUT (no mocks here). If the computations are complex (and maybe they need help from external collaborators), and you test them indirectly together with the SUT, then you may introduce a problem I mentioned before — an integrated test with all its problems (remember — test case explosion, hard defect localization, and obscure test).

What can you do in the latter case?

Don’t replace the dependent object with a test double. Test happy path (of the dependent object logic) together with the SUT. Exercise secondary or non-happy paths in an isolated test.
I’m personally not inclined to this solution — if the primary path of the object is checked in multiple tests, and there is a bug, multiple tests will fail.
Treat it as test feedback. If it’s hard to mock, reconsider class responsibilities. Should they be split and moved somewhere else? Has the class the right API?

Watch out for the cases (or just temptations) to introduce partial test doubles. Partial mocks or stubs replace only the logic you want to keep out of the test. They leave the literal values out of the mocking scope — you work directly with them. They are another smell pointing you at the complexity or an awkward design.

Don’t mock what is not yours

Some people use test doubles to replace code belonging to external libraries. What are the reasons behind that practice?

You don’t know to plug in the external library in the test. Or you tried it and failed
It is hard to assert the outcome of the test when the external code is used. Maybe you need to provide an extra infrastructure to perform the checks. For example, you need an external, managed HTTP server to ensure that the HTTP client performing a request reached it and got the expected response.
You know how to work with the external library in the test context but the test execution is slow

Basically, you want to avoid integration tests and to have the benefits of unit tests — fast, repeatable, and predictable execution.

Why do you need to write integration tests? You want to ensure that the external code (integrated with yours) does the job you expect from it. For example, if it connects to storage to save the state, you want to be sure that the command really persisted the state. If it’s an HTTP client, you check that it reaches the remote server and it processes back the response.

You shouldn’t use mocks and stubs to replace external dependencies (you may use fakes, but hold on a little longer). I’ll expose my arguments against mocking what you don’t own again from two perspectives: the mockist and the classist.

The mockist way — mocks as a design tool

If you use mocks/stubs as a design tool, introduced test doubles define a contract binding the client and the provider (the implementation behind the interface replaced by the mock). The client is the owner of the contract. It decides what is required by the provider.

Your responsibility as a developer is to fulfill the contract on the provider side. You have to be able to adjust the provider to the new contract if the client is modified. In the case of external code, you, as the client representative, neither can define a new version of the contract nor cannot tweak the provider side. You take the contract with the external party as they defined it.

Mouse Animal and Elephant icons by Icons8

The classist way — mocks as test complexity reducer

In case you introduced mocks/status to reduce test complexity (the classist way), and you did replace an external collaborator with them, you may think that you indeed reduced test complexity. But can your tests be trusted?

Are your tests really checking if the external provider (that you don’t control) provides the answers expected by the SUT or it performs a side effect that is going to change the state of the system? How do you verify that? The only way to provide evidence that the external library does what it promises is to execute it.

There is no way around the integration tests — you have to write them. But you can mitigate some of their negative characteristics.

Keep the integration test scope as narrow as possible

Split your code into two parts:

One free from external dependencies. Put any logic there. Verify it with unit tests.
One that talks to the external world. Keep it as dumb and straightforward as possible. Cover it with integration tests.

Take this sample Repository verified by the integration test:

class Repository {
  fun findById(id: Int) {
    val request = adaptToApiRequest(id)
    // execute uses an external HTTP library
    val response = execute(request)
    return adaptToDomain(response)
  }
}

Extract the request execution into a separate module (let’s call it HttpClient). Test Repository with a unit test mocking the client. Narrow the integration test scope only to the newly extracted client.

// unit-tested
class Repository(val httpClient: HttpClient) {
  fun findById(id: Int) {
    val request = adaptToApiRequest(id)
    val response = httpClient.execute(request)
    return adaptToDomain(response)
  }
}// integration-tested
class HttpClient {
  val execute(request: ApiRequest): ApiResponse = …
}

Isolate dependencies or use fakes

The old bad times are gone when you were using a shared database to perform an integration test. External components can be isolated with throwaway local containers. Docker used for testing can eliminate erratic tests ensuring that each execution starts from a known state.

If the container takes long to start (where long can be a couple of seconds), you can try good old alternatives — fakes, the forgotten test double. They provide the same API as the original but have reduced capabilities (an in-memory database has no durable storage). You are trading fidelity for performance, especially with in-process fakes. Sometimes the external library already provides fakes for testing — notable examples are Spring MockMvc or MockRestServiceServer. The key issue is here is if you trust your provider that the replacement promises to behave equally as the original.

Exceptions to the rule

If you have got this far, I hope I convinced you that you shouldn’t mock strangers. Sometimes, however, it is convenient to stub them. There are special cases and boundary conditions which are hard-to-impossible to simulate with real dependencies. For example, how can you check how your code will behave when your HTTP client encounters a network I/O error?

For those special and rare cases (if testing them brings real value), I do recommend using mocks. They are like secret agents sent for exceptional missions — not a general levy summoned to invade your integration tests.

Too Long; Skipped the Article

So, when should you use mock objects? For those that skimmed over the article, a < 100 word summary:

Introduce mocks and stubs to focus tests on some specific parts of the production code and isolate them from the rest.
Mocks/stubs describe a contract of what is needed by the System Under Test from its dependencies. Your duty is to implement the provider part of the contract.
Replace only your own code with test doubles. Only in those cases, you are able to fulfill the contract.
Mocks are not a free lunch. Use them for modeling abstract computations, not values.

Effective test doubles, part 2

Introduce test doubles for boundaries to keep outside of your tests

The mockist way — mocks as a design tool

The classist way — mocks as test complexity reducer

Which approach is the best?

Don’t suppress design feedback with mocks

Mock behavior, not values

The mockist way — mocks as a design tool

The classist way — mocks as test complexity reducer

Hard-to-create values

Blended behavior and state

Don’t mock what is not yours

The mockist way — mocks as a design tool

The classist way — mocks as test complexity reducer

Keep the integration test scope as narrow as possible

Isolate dependencies or use fakes

Exceptions to the rule

Too Long; Skipped the Article

Written by Marcin Gryszko