logo

A Journey of Driving Down Test Flakes to 0% at Metabase - Part 3

In this failure, we look into a bug where a React component calls Math.round and gets a different value when the test passes and fails!
profile photo
Filip Hric
For the past six months, we’ve been on a journey of helping Metabase drive down their test failures and test flakes. We have discovered many different cases which were worth sharing. In part 1, we were debugging a case where the application ran faster than the test. Part 2 was a case in which a css change caused a different outcome of the test result.
In this failure, we look into a bug where a React component calls Math.round and gets a different value when the test passes and fails!

Case study: Pagination table

A usual go-to strategy for debugging tests is to compare these two tests and figure out differences between failing and passing tests. Tests examined in this case have been parting on line 32:
Passing test
Passing test
Failing test
Failing test
The assertion was supposed to find 8 elements on page but it has only found 7. There was also a slight visual difference between the components in the examined test run. The failed run had a slightly larger gap between the last item in the table and pagination element.
Passing test
Passing test
Failing test
Failing test
Right at the beginning it was obvious this was not a data problem. Both tables show 51 elements and examination of network panel confirmed that both tests contained the same data.
Network panel
Network panel
This time, looking at the React panel was a good debugging choice. Finding the table component was pretty easy task and it revealed the key function.
The TableSimple has a lot going on, but the most important part is how it calculates the currentPageSize which decides how to paginate the items and how many elements are shown on the current page.
With Replay, we can add this console.log("Size without rounding", (height - headerHeight- footerHeight) / (rowHeight + 1)) to see why currentPageSize is different when the test passes and fails.
In the passing test the calculation returned 8.01 which revealed the reason why we see 8 or 7 elements rendered in our test results. The print statement revealed that the reason why we sometimes see 8 and sometimes only 7. It was because of a rounding error. When we looked into each property such as height, headerHeight and footerHeight we discovered that in the failed case, the height equaled to 467 and on the passing test, it was 469. This was a 2 pixel difference that made the test fail!
The root of the problem was the rounding of this formula result. The calculation would always round down, using Math.floor() function, so even if the difference was just 2 pixels, we would end up with 7 items in the table, instead of 8.
The behavior of this table is mostly ok since it generally makes sense to round down and not accidentally cut items in the table. But in this particular case a random change may have shifted the height of the table, resulting in unexpected behavior. A fix could be applied to use Math.round() instead of Math.floor() so that the function would rather round the result to the nearest number. Or the test could be adjusted to accept both 7 or 8 elements rendering on page.

Conclusion

This case study shows that when it comes to modern e2e testing, even small and obscure changes can have a significant impact. 2 pixels don’t seem like much, but they had power to alter the behavior of the application.
This is where Replay is unique in its capabilities as finding this difference by reproduction would take ages would be close to impossible. Replay allows you to see what happened, but more importantly, it allows you to see how things happened. Deep diving into React components and seeing how values change in time or how they differ from one test to another makes debugging simpler than ever. This means that reducing test flakes to 0% is an ambitious but achievable goal.
💡
If you experience flaky tests and need more insight, come talk to us on our discord, or sign up to try Replay on your own
Related posts
post image
Even seemingly minor changes to your code can have unexpected consequences for your e2e tests. We have been repeatedly learning this lesson while helping Metabase drive down their e2e test flakes.
post image
Test flakiness is annoying, but it can sometimes point to a real problem in the application. This is sometimes referred to as “false positive”. A false positive happens when a test should fail, but instead it passes. Learn how you...
post image
Ryan highlights some CI improvements, mainly our GitHub Actions for our Playwright integration.
Powered by Notaku