logo

The Replayability Roadmap

profile photo
Jason Laster
We focus on Time Travel Debugging because it addresses the problem of reproducibility, but deterministic replay has broader applications.
This post will touch on some of these use cases, but our larger goal is to share a roadmap for where Replay’s Protocol is headed and what that means for dynamic analysis at scale.

Dynamic Analysis

Dynamic Analysis is the study of how software executes over time. Tools today use instrumentation which works well for alerting and observability use cases, but does not support post-hoc dynamic analysis.
Replay records the essential non-determinism in the runtime necessary to replay the runtime exactly as it ran before. This lets us support a map/reduce style control flow analysis today. And we are beginning to explore Object Persistence APIs which will support data flow analysis later this year.
Software analysis falls into two categories: static and dynamic. Products like GitHub Copilot have shown what is possible with static analysis at scale. At a macro level, Replay is a bet that there are similar opportunities with dynamic analysis at scale.

Visual Regression Testing

Differential analysis focuses on the changes between execution traces. One popular use case is Visual Regression which is used to detect perceptual changes over time.
Let’s say we want to build a Visual Regression tool on top of Replay’s Protocol. The first step would be to collect the meaningful screenshots to compare. If there is a difference between two screens, we can find the elements that are different. Once we have the elements, we can fetch their attributes, applied rules, and computed properties and see why the elements are different.
Let’s say an element is different because it has a new class. Replay can help us figure out why the class was added. If we know that the application is built with React, we can find the component that rendered the element and look at its props and state. With the props and state, we can see why the component rendered and why the element was given that class!
Being able to compare two recordings is good, but our goal is to compare a corpus of recordings. A good example of the difference is TypeScript vs Copilot. TypeScript statically analyzes a single repository. Copilot statically analyzes a corpus of repositories. The same principle applies to dynamic analysis and the ability to observe many executions.

Intermittent Test Failures

End-to-end tests are easy to write, but hard to maintain. Every company we have talked to has identified intermittent test failures as a drain on developer productivity and continuous development. Test runners often build in automatic retries, but issues like network latency and code churn make intermittent tests impossible to solve at scale.
There are three ways in which Replay can help:
  1. Test failure categorization If a test ran a thousand times and failed one hundred times, we can compare the runs and categorize the failures. Replay will bucket the failures and show you the most common failure. e.g. an API call that 5% of the time is slow.
  1. Root cause analysis If we believe that the test is failing because of a slow API call, in the future, we will be able to test that theory by modifying the recording and simulating speeding up the API call. If the test now passes, we have successfully confirmed the problem.
  1. Automated remediation If we know that the test fails when the API call is slow, in the future, we will be able to find the function that depended on the data being available, modify the code, rerun the test, and see if it passes. If the test passes, we can open a PR with the suggested fix.

Conclusion

It is impossible to put an upper bound on the potential of dynamic analysis at scale. In 2019, OpenAI showed us that they can train an AI to be world-class at Dota 2 with a corpus of replays and an environment conducive to deep learning. If it is possible to do something similar, developers would no longer need to maintain their software.
As it becomes possible to perform dynamic analysis at scale, replayability will revolutionize domains ranging from performance to privacy. We’re just getting started and could not be more excited to explore this space together.
Related posts
We share a demo of a prototype we’ve developed for debugging an application’s full stack.
post image
In this post we develop the ideas underlying Replay’s upcoming Performance Analysis, using a real world regression drawn from the development of our devtools.
Powered by Notaku