Write Tests You Love, Not Hate #5 - Understanding the Fragile Test Problem

What if tests break with every minor change, even though the behavior of the system hasn't changed? To understand the Fragile Test Problem, we dive into an example and identify the underlying issue.

May 02, 2024

Sometimes, you might make minor changes in your implementation only to spend days fixing the tests afterward. This is an extreme—and blunt—example of the Fragile Test Problem.

In this case, tests are highly sensitive to changes that don’t affect the actual behavior of the system (often called refactorings). They generate many false positives, making them untrustworthy and unreliable. This, in turn, significantly slows down development.

In this post, we examine the problem and identify one of its root causes.

The Fragile Test Problem

Figure 1 shows a typical test scenario. On the left-hand side, you see the implementation, and on the right-hand side, the tests. We have a UserService that uses a UserRepository. A test called UserServiceTest checks the behavior of the UserService by mocking the UserRepository. So far, so good. Now, imagine that the system evolves, and after several months, the team decides the UserService has become too large and needs to be split.

Figure 2: The tests fail after splitting up the `UserService`.

They decide to extract multiple services, like UserProfileService, UserLoginService, and UserNotificationService, from the UserService, as illustrated in Figure 2. However, this change breaks the tests as indicated by the red color of the UserServiceTest.

Figure 3: Mocking the new services fixes the test.

To rectify this, the team mocks the behavior of the new services in the UserServiceTest, which then completes successfully (see Figure 3). Yet, the code moved to the new services is now untested.

Figure 4: New tests verify the behavior of the new services.

Thus, the team must create a new set of tests for each of the new services (see Figure 4) to test their behavior individually.

Figure 5: The new tests run successfully.

To make the new tests work, they need to mock the UserRepository and, of course, implement the proper checks and verifications. Finally, the team finishes the new tests, and everything works nicely, as you can see in Figure 5.

However, life goes on, and after some time, the team concludes that their chosen setup was suboptimal. They decide to make changes once again and restructure their code.

Figure 6: The tests break again with the next restructuring.

As a result, three of their tests break again, as shown in Figure 6. Since functionality has moved in the implementation, the tests also need to be moved and adjusted.

After a few days of work, the tests are back up and running.

But is everything truly what we want? Let's consider what happened here. Besides being painful for the team, this approach underscores several problems with common testing practices.

This is no refactoring!

The first issue is that none of the steps described previously qualify as refactorings. Refactoring is defined as a sequence of small changes designed to keep the tests passing at all times.

The key principle here is consistent: Changes should keep the tests passing at all times. However, this was not the case in our example, where the tests broke on several occasions and required adjustments.

Why is this problematic?

In our scenario, we were basically forced to write new tests after the initial changes to the UserService and make significant adjustments following the second change. Consequently, we cannot assure that the behavior of the system post-changes remains identical to its previous state.

The fundamental goal of our tests is to facilitate modifications that might alter the system's structure without affecting its behavior, ensuring the behavior remains consistent.

When we are forced to essentially rewrite the tests, we forfeit this advantage. The outcome is akin to not having any tests at all, except with additional effort involved. We can no longer guarantee that we haven’t introduced bugs or unwanted behaviors.

Tight Coupling

Let’s consider our example from a different perspective. Instead of focusing on implementation and test code, let’s discuss two separate components, A and B.

Figure 8: Two tightly coupled components.

Upon examining the connections between these two components as depicted in Figure 8, it becomes evident that they are tightly coupled. Each element in Component B is directly linked to at least one element in Component A. Therefore, any change in Component A necessitates a corresponding change in Component B. As we've learned in our Software Engineering 101 course, tight coupling is always a bad idea.

Looking at the problem from this perspective already points to the solution: In order to solve the fragile test problem, we need to decouple tests from implementation. The specifics of how to achieve this will be explored in our next post.

Thank you for reading ❤️

Fighting Complexity

Discussion about this post