Changelog 68: Fixing test failures with AI

Hi everyone, this week we’ve put out a post on the second in our series of demos, for automatically fixing a browser test failure using AI and Replay based analysis. Based on just the failure logs, LLMs struggle greatly to understand the problem. Analyzing and describing the immediate cause of the failure allows the LLM to both explain the problem and develop a fix reliably.

We believe that most test failures you run into day to day can be fixed completely automatically with the right combination of AI and analysis. We want an experience in the near future where, after you create a PR and see those dispiriting red X’es as the tests run, an agent files its own PR against your branch a few minutes later to fix those failures. You can review and merge its changes without even looking at the failures, and get back the time you would have spent investigating them.

Being able to reliably and automatically fix test failures is necessary for a full fledged AI developer to make improvements on its own to an established code base. We’re not ready for that quite yet, but this piece of the puzzle is enough to help now, and provides a more in depth improvement than we built with Replay for Test Suites, where we focused on helping you fix the hardest flakes.

Also unlike Replay for Test Suites, this project is speculative: we’re in largely unexplored territory. To move this forward we need to see and understand the failures you’re running into so we can design the right analyses to fix them effectively and save real time. If you’d like to try this out and help us, let us know by emailing hi@replay.io or filling out our contact form and we’ll be in touch.