logo

Replay Time Travelogue: How Replay MCP Helped Find a React Bug Faster than Dan Abramov Did

An example of how Replay MCP enables agents to find and fix deep bugs
profile photo
Mark Erikson
We’ve always said that Replay time-travel recordings enable developers to solve really hard bugs, by giving them the ability to inspect the app’s behavior at any point in time. This is especially true for timing bugs and race conditions, where it may be impossible to actually debug them with standard developer tools that would block the race condition from even happening. Now, with Replay MCP, we’re giving agents those same time-travel investigation superpowers.
I recently tried using Replay MCP to investigate a complex React internals bug that Dan Abramov had already investigated and eventually fixed via an agent (but only after some failed attempts). Would Replay enable an agent to find the right answer? And how long would it take?

Background: The React useDeferredValue Bug

In early February, former React core team member and well-known React expert Dan Abramov filed React issue #35821: useDeferredValue gets stuck with a stale value . He reported that he had seen useDeferredValue get “stuck” in prod builds and never re-render with the updated result. He included a fairly minimal repro that had two textboxes. Type in the first textbox, the entries got sent to the server and reflected into the second textbox via useDeferredValue. This worked fine in dev build, but could semi-consistently repro and get stuck in prod builds.
A month later, Dan filed React PR #36134: Fix useDeferredValue getting stuck . The actual fix was 4 lines of code deep in ReactFiberWorkLoop.js to ensure React’s internal “lanes” data structure got updated properly.
Interestingly, the PR itself was described as “Written/debugged by Claude”.

Dan’s Agent Investigation Thoughts

Dan later clarified that the “month” was really just two sessions a month apart: a first session where he tried to have Claude build a repro and it failed, and a second session where he instructed it to add logs and got the correct solution.
I think Dan’s point about “information over time” is critical and 100% accurate.
This is exactly why Replay exists, and why we’ve built Replay MCP!
Once you have a Replay recording of a bug, you can investigate it as much and as deeply as you want. The runtime execution becomes data you can query. When did React render, and why? How many times did a given line of code execute? What was the value of x every time this line of code ran?
Given that, I wanted to compare how well a standard AI agent could investigate the same bug, given only the bug report and access to Replay recordings of the issue.

Agentic Time-Travel Debugging with Replay MCP

Let’s step back and recap what Replay MCP actually is.
Replay is a time-travel debugger for web apps. The Replay recording browser captures DVR-style recordings of an entire web app. Because we capture the entire browser’s behavior and inputs, we can replay the app’s execution exactly the way it ran during the original recording. Unlike session replay or prod monitoring tools, Replay lets you do time-travel debugging and inspect the app’s behavior at any point in time: see which lines of code executed, evaluate log statements for every time a line ran, view the DOM tree and React component tree at any point in time, and much more. This makes it possible to investigate and solve bugs in ways no other tool can.
Replay DevTools is our debugging UI for humans - browser devtools with time-travel built in. Replay MCP gives agents those same time-travel debugging capabilities. Agents can open a recording and use the MCP tools to investigate the same way a human would: looking at console messages, adding logpoints to evaluate expressions each time a line of code ran, getting screenshots and stack traces, and getting framework-specific insights into libraries like React, Redux, Zustand, and TanStack Query. This means agents can now do the investigation work for you automatically!

Investigation Process and Setup

For this experiment, I used my own personal agent setup: OpenCode 1.4 and Opus 4.6. I have some file search and context management plugins enabled, but otherwise no specific skills or custom behaviors.
In other investigations, I’ve found that the context and investigation prompts have a huge influence on the results: telling an agent how deep to go, giving it directions on scientific method steps, providing context on the available codebase.
I did an initial run to see if the agent could even get close to the correct answer. I was thrilled to see that just by analyzing the Replay recordings, the agent successfully identified the root cause and the fix in under 10 minutes!
That alone is an amazing result. As Dan described, his own agent struggled investigating originally, and only later succeeded once he used Andrew Clark’s hint and had it rebuild React with logging added. In comparison, just having Replay recordings available to investigate was enough for an agent to solve it right away!
With that as a baseline, I set up a proper experiment: how much do prompts and investigation instructions matter? I kicked off four parallel agent investigation sessions. Each agent session was given the same access to the Replay recordings of the bug and a local copy of the demo app source, but with varying prompt instructions:
  1. Light details, bug repro, only told to “write a bug report with a root cause and suggested fix”
  1. Same repro steps, but a detailed 8-step investigation methodology including explaining why the problem is happening
  1. Additional summary of React’s internal scheduling system concepts
  1. Additional list of Replay MCP Tools and their purposes
How would they do? What differences would we see between them in results or investigation times?

Investigation Session Results

I was thrilled to see that with Replay recordings of the bug available, all 4 agent sessions successfully used Replay MCP to nail the actual root cause and produced valid suggested fixes, in under 30 minutes!
Here’s how they tackled the investigation.

Agent #1 (Basic Instructions): 28 Minutes

Agent #1 was given the least context and instructions, so it spent the most time trying to orient itself and understand React’s internals. It also went down the most rabbit holes chasing false leads :) After starting with the Replay MCP RecordingOverview tool and seeing the prod recording ending with a render commit mismatch and no SuspenseResumed commit, it dug into the implementation of useDeferredValue . This got stuck on promise semantics for several minutes before concluding it was a scheduling problem.
It used the Logpoint tool in pingSuspendedRoot and confirmed there were ping issues. It took another 10 minutes of tracing through RSC promise resolution, including checking call stacks to confirm portions of the call stack were synchronous, before it found the relevant Suspense ternary condition and analyzed that.
It ultimately proposed three fixes: fixing the pingSuspendedRoot ternary (matching PR #36134), making pings async, and updating pinged lanes after renders complete.
This was the longest session, but it did a remarkably thorough job of tracing through the complexity of React’s internals and using Replay MCP Logpoints and other tools to understand what was going on.
Agent #1 prompt
Agent #1 final analysis

Agent #2 (Investigation Methodology): 17 Minutes

Agent #2 also started with RecordingOverview and quickly identified the commit mismatch. It then used Replay MCP’s React render trigger details to trace the causation chain from keystrokes to render commits.
It made its way to pingSuspendedRoot and used the Logpoint tool to check the hits and values inside. It got briefly sidetracked on scheduling behavior before coming back to the rendering logic.
Along the way, it actually identified a second potential bug that none of the other runs found. It found a isThenableResolved(thenable) call that checks to see if a promise is resolved or not, and identified that RSC promises use a different status value, "resolved_model" instead of "fulfilled". It suggested a tweak to this logic to allow resuming synchronously instead of re-throwing.
In the end it also suggested the common “re-ping lanes” fix that the other agents and the PR had, but this was a fascinating insight, and derived solely from inspecting the runtime behavior in this one recording.
Agent #2 prompt
Agent #2 final analysis

Agent #3 (React Scheduling Concepts): 8 Minutes

Agent #3 started with RecordingOverview to check the output, and Screenshot to confirm the problem visually. Since it already had the terminology to describe React’s scheduling internals, it made a beeline to functions like markRootSuspended, even without any actual knowledge of the implementation.
It used the Logpoint tool to check hits for pingSuspendedRoot and markRootSuspended, and quickly identified the relevant ternary logic as the culprit. It then checked vs the dev recording line hits, and came up with the pinged lanes solution, without any major rabbit trails.
Agent #3 prompt
Agent #3 final analysis

Agent #4 (React Concepts + Replay Tools Overview): 7 Minutes

Agent #4 started off the same way as #3, with RecordingOverview and Screenshot tool calls. After scanning the source for pingSuspendedRoot in the React bundle, it tried to use Logpoint calls but struggled a bit with the syntax, so it switched to Evaluate instead to similarly retrieve real values in scope at various execution points.
It found the relevant ternary logic in 4 minutes, confirmed the issue and compared with the dev build, did some red-team review, and wrote up the final report with the correct pinged lanes solution.
This run was the most efficient - it read the source, knew where to instrument, and produced the best final report .
Agent #4 prompt
Agent #4 final analysis

Analyzing the Agent Results

I went into this hoping that having Replay recordings available would help prove that Replay runtime data makes it easier to solve hard bugs. I’ve certainly experienced that myself just working on Replay and using it over the last few years! So it was extremely satisfying to see that every single one of the agent runs was able to find this complex React bug and propose the correct fix, based only on the Replay recordings of the bug!
As Dan noted: his own agent wasn’t able to solve the issue itself. It wasn’t until Andrew Clark pointed him in the right direction that his agent was able to add the right log calls to React’s source, rebuild, analyze the logs, repeat the process, and eventually converge on the right answer.
Replay MCP’s tools gave my agents the ability to analyze the runtime behavior without having to keep rebuilding React! The Logpoint tool allowed agents to dynamically evaluate an expression every time a line of code ran, while the Sources tool and its built-in “hit counters per line” values acted as an impromptu profiler to help guide the investigation.

Comparing Agent Behaviors

Looking at the prompts and the results, I see a few key differences:
Dimension
Run 1 (Baseline)
Run 2 (+Method)
Run 3 (+Context)
Run 4 (+Tools)
Duration
~28 min
~17 min
~8 min
~7 min
Messages
154
111
47
42
Context compressions
6
4
2
2
Found ternary bug
Fix matches PR #36134
✅ (Fix A of 3)
Partial (Fix B)
Found isThenableResolved gap
✅ (unique)
Proved sync call chain
✅ (GetStack)
✅ (ReadSource)
Quantitative hit counts
✅ (best)
Dev vs prod comparison
False lead time
~8 min
~4 min
~1 min
<1 min
Red-team analysis
Moderate
Moderate
Light
Best
Number of fixes proposed
3
2
1
1
Bottom-up exploration
Extensive
Moderate
Minimal
Minimal
Even with the most basic instructions, just having the recording and MCP tools was enough to let Agent #1 eventually figure out the root cause and propose a valid solution for an otherwise unsolvable bug.
We’ve all seen this over the last year, but it’s still incredible to me that an AI can dive into a codebase or problem space and orient itself just by reading some files and produce real value.

Prompting and context are still critical

Just giving a more detailed step-by-step investigation pattern cut the investigation time in half. A few paragraphs of “here’s some concepts of what React’s scheduling internals involve” cut it in half again. And as I’ve seen in some other investigations I’ll talk about in a future post, having some relevant skills files available can produce drastically better investigation results.
Dan said his agent “instrumented the React codebase with logs”, and really needed “information over time”. That’s exactly what Replay MCP provided! Tools like RecordingOverview and ReactRenders to surface info on React behavior and error messages; Logpoint and Evaluate to extract specific values at various points in time; Sources to view source files in the bundle and see the hit counts to understand execution; Screenshot to visually inspect the UI at a given point in time; NetworkRequests to check the requests and results; and even more niche tools like GetStack and DescribePoint to inspect the JS execution flow.
Replay MCP provides all these and more, enabling agents to actually understand the runtime behavior over time, without having to rebuild the app with more logging. Capture the recording once with all the tricky timing behavior, investigate as deeply as needed, automatically.
What do Replay MCP’s tools look like?

Replay MCP: The Time-Travel Superpower Your Agents Need

I joined Replay because I saw the potential and promise of time-travel debugging and how it can make the debugging process drastically faster and better. It enables solving bugs that are otherwise impossible, and provides insights into real runtime behavior that no other monitoring or analysis tool can provide.
Replay DevTools gave humans the ability to investigate and solve bugs with time travel.
Now, Replay MCP gives your agents the same time-travel superpowers.
You can add Replay MCP to your own agents and workflows today! Plug it in, make a recording of a bug or a failed test, and let your agent do the time-travel investigative work for you.
And, coming soon: we’re working on expanding our existing E2E Playwright and Cypress recording integrations to automatically investigate test failures and provide analysis and recommended fixes! This will help your team fix flaky tests, ensure PR code quality, and improve shipping velocity.
Try out Replay MCP and our E2E Test Suites integrations today, and stay tuned - we’ve got a lot more awesome time-travel debugging capabilities coming soon!
Related posts
Vibe-coding apps in 60 minutes
On accessible software development
We use Replay+OpenHands to automatically fix a client to handle a new data format
Powered by Notaku