Effective test doubles, part 1

Marcin Gryszko
8 min readNov 11, 2020
Photo by Austin Distel on Unsplash

Twenty years ago, Endo-Testing: Unit Testing with Mock Objects, the first paper on mock objects was published. Its authors described the usage of real object replacements to improve both the production and the test code. They named Mock Objects them after Mock Turtle, a creature from Alice in Wonderland that believed it was once a real turtle. Just as the story about Alice — surreal, psychedelic, and apparently nonsensical — mock objects are still not fully understood by the software community.

In part 1, I’m going to share some tactical tips about the efficient and effective usage of mock objects; part 2 deals more with the mocking strategy. The article will help you to employ test doubles as a useful tool to validate your design and tame test complexity. Mock objects are not considered harmful after all!

In the rest of the article I’ll be using test double terminology (mock, spy, stub) as defined in Martin Fowler’s Mocks Aren’t Stubs.

Don’t verify stubs

You introduce stubs to provide answers to queries made by the System Under Test (SUT) to its collaborators — so-called indirect inputs. Those replies are used later by the SUT in its behaviour — either to transform the input arguments to the invocation result, or to make other queries, or to perform side effects (directly or indirectly).

You don’t have to verify stub invocations. You will know that the stubbed dependency was called with the expected arguments and that its response was used in the SUT logic by examining the SUT output. And you will know if the dependency was not called if the final state doesn’t match the expected one.

Consider this example of invoice creation:

fun createInvoice(..., freeformAddress: String, ...) {
val structuredAddress = addressNormalizer.normalize(freeformAddress)
return Invoice(..., address = structuredAddress, ...)

and its corresponding test (with MockK):

val structuredAddress = Address(street = "::street::", ...)
every { addressNormalizer.normalize("::address::") } returns structuredAddress
val invoice = createInvoice(..., address = "::address::", ...)assertThat(invoice).isEqualTo(Invoice(..., address = structuredAddress, ...))

If the addressNormalizer stub is not called with the right arguments and its response is not used in the invoice creation, the result (or then final state) assertion will fail, giving you a hint that the collaboration didn't take place.

Don’t set up call cardinality on stubs

It’s a bad idea to verify how many times a stub was called. You shouldn’t actually care about this; you care only about what your production code does with the result of a stub call (and, as I wrote previously, you check it by inspecting the final SUT state). It doesn’t matter if you ask your peer once or a thousand times — the SUT outcome should remain the same.

Stubs replace a real dependency expressing the syntactic and semantic contract with the collaborator. Most of the time you don’t know and don’t care what concrete implementation fulfills that contract. But sometimes you do want to check how many times the stub was called. You may know that in the real-life the stub execution is costly (e.g. due to a long-running query) and you want to ensure that it happens only once.

In that situation, you can add the call cardinality verification. However, you have to be aware that you are losing the abstraction that the stub represents. You are no longer working with the abstract stub contract; you have in mind a concrete implementation of the stub interface. You introduce a hidden coupling of the test to the collaborator implementation, even if you don’t explicitly reference it in the test.

Is this good or bad? It depends. You have to make a tradeoff — either you want to have an isolated test or you want to document that there is a performance constraint on your production code.

Always verify mocks

You introduce mocks to check if side effects were triggered by the SUT (by side effects, I mean observable side effects — those being significant in our program). You don’t want to directly check the global state change caused by a side effect — you only ensure that the behaviour happened (and how many times). In other words, you put in place a mock as a replacement for commands, or indirect outputs.

On the contrary to stubs, you should always verify mocks — how many times they were called. You check (indirectly) that the desired side effect was requested through the collaborator interface. Then, in a separate (and isolated) test, you validate the concrete side effect. For example, you will confirm that the file system or the database were changed as expected.

Remember to verify the mock call cardinality. If you don’t do this (i.e. you allow one to many mock executions) then you are no longer sending an abstract message to the mocked collaborator. You have a concrete implementation in mind where you know that the underlying side effect is idempotent — and this is an extra latent coupling between the test and the real dependency.

Prefer mock libraries over hand-crafted test doubles

Some people prefer writing hand-crafted spies instead of using mocking libraries. The arguments for this approach are increased performance and less external dependencies. The performance of hand-made doubles is much better since there is no bytecode manipulation needed to create them. But the price you have to pay is a lot of boilerplate code you have to write to manually implement your spies.

Consider a simple interface and its testing, spying implementation:

interface InvoiceRepository {
fun save(invoice: Invoice)

Compare the hand-made spy:

class TestingInvoiceRepository : InvoiceRepository {
lateinit var savedInvoice: Invoice
var calledTimes = 0
override fun save(invoice: Invoice) {
savedInvoice = invoice

with one-liner Mockito implementation:


Even if you write your spies in a terse language like Kotlin, it’s still a lot of code. In this example I implemented only one interface function; imagine that you have three collaborators, each of them providing 2–3 interface methods. How much code do you have to write? What if you have to return a different value on each test double invocation?

My recommendation is to make use of a mocking library (Mockito, MockK, JMock — you name it) instead of your own testing implementations. Don’t repeat other people’s hard work. You won’t decrease test execution time but you will reduce time and effort dedicated to the test maintenance.

You don’t need a library-provided test double for simple queries:

val lazyInt: Supplier<Int> = Supplier { 1 }
val featureToggle: () -> Boolean = { true }

These one-liner test doubles are as simple (or even simpler) as those created by the library of your choice.

Strict mocks by default

Mocking libraries can detect if there were unwanted or redundant interactions with test doubles. The strictness depends on the library: JMock forces you to describe each and every call to a mock/stub (and fails fast if the execution goes off the rails), Mockito can detect configured but not executed interactions, MockK stops on an unplanned test double call (but it doesn’t detect unused mock configurations).

Don’t turn these features off: don’t use lenient() Mockito doubles, don't setup relaxed MockK doubles and don't set allowing expectation on JMock mocks (reserve it for queries to stubs). Your tests should act as the documentation of the code you are checking. Keeping unused mock configurations lying around makes the test more obscure to the reader. Future test maintainers have to analyze the production and test code in order to filter signal from noise - whether the test double setup produces the desired effect on the SUT behaviour or it's just an irrelevant piece of code that can be safely deleted.

Verify arguments passed to test doubles

In the last section, I discussed why you should use test doubles in the strict mode (detecting unplanned interactions automatically). In the same way, you should strictly verify the arguments passed to test doubles. This means comparing the arguments with the expected values by their equality and not using any() style matchers (you may refer to my article about Painless Assertions for more details about equality in tests).

Instead of:


Pass the expected object to the mock specification:


You should be following this practice for various reasons:

  • You want your tests to act as living documentation of how the object/function/module churns out the output by using its logic and external collaboration. By specifying any(), you omit the details of this collaboration.
  • You want your tests to catch regressions or unintended changes in the production code. Imagine somebody malicious replacing the argument matched as any() with null, or a different object instance. Is this behaviour intended or just a programmer mistake? A robust SUT->test double collaboration specification will tell you that.

As usual, there are exceptions to this rule. I always configure my mocks using equality-based assertions in the happy path scenario and for any other scenario when the arguments passed to the mock depend on the SUT logic. But I do use any() matchers for the unhappy or edge-case scenarios, like checking error handling to avoid duplication in the test.

Take this example of the successful invoice creation. I compare what is being saved in the repository with the expected invoice:

fun `invoice created - happy path`() {
val invoice = Invoice(id = 1)

// SUT invocation


When I want to check how the errors from my collaborator are transformed, and I don’t care what is passed to the mock, I use any-like matcher, knowing that the other test case already verified the argument in the message to the mock:

fun `error handling` {

assertFailsWith(InvoiceNotCreatedException::class) {
// SUT invocation

Repetition of test doubles

Watch out for repeated use of the same test double. You see the same call to a mock or stub is repeated over and over in multiple test cases. This is required to reach some execution point in the production code where there is significant logic to be verified in the test.

This duplication is not intrinsically bad; it is the feedback the test is giving you about the design. Maybe it asks you to extract a new collaborator. Maybe you have to rethink and redistribute responsibilities. Don’t sweep that feedback source under the rug, like extracting helper setup methods with test double configuration. Review your design, experiment with alternatives, and see if the duplication is reduced or ideally removed.

If you want to go deeper about the good and not-so-good test helpers, check the article:

That’s all about the mocking tactics. In the second part, I give you guidelines on when to introduce a test double (and when to) and what should be replaced by a mock/stub.



Marcin Gryszko

Enduring software engineer and long-distance cyclist