Replay Time Travelogue: How Replay MCP Helped Find a React Bug Faster than Dan Abramov Did
An example of how Replay MCP enables agents to find and fix deep bugs
Mark Erikson
We’ve always said that Replay time-travel recordings enable developers to solve really hard bugs, by giving them the ability to inspect the app’s behavior at any point in time. This is especially true for timing bugs and race conditions, where it may be impossible to actually debug them with standard developer tools that would block the race condition from even happening. Now, with Replay MCP, we’re giving agents those same time-travel investigation superpowers.
I recently tried using Replay MCP to investigate a complex React internals bug that Dan Abramov had already investigated and eventually fixed via an agent (but only after some failed attempts). Would Replay enable an agent to find the right answer? And how long would it take?
Background: The React useDeferredValue Bug
In early February, former React core team member and well-known React expert Dan Abramov filed React issue #35821: useDeferredValue gets stuck with a stale value . He reported that he had seen useDeferredValue get “stuck” in prod builds and never re-render with the updated result. He included a fairly minimal repro that had two textboxes. Type in the first textbox, the entries got sent to the server and reflected into the second textbox via useDeferredValue. This worked fine in dev build, but could semi-consistently repro and get stuck in prod builds.
Dan later clarified that the “month” was really just two sessions a month apart: a first session where he tried to have Claude build a repro and it failed, and a second session where he instructed it to add logs and got the correct solution.
I think Dan’s point about “information over time” is critical and 100% accurate.
This is exactly why Replay exists, and why we’ve built Replay MCP!
Once you have a Replay recording of a bug, you can investigate it as much and as deeply as you want. The runtime execution becomes data you can query. When did React render, and why? How many times did a given line of code execute? What was the value of x every time this line of code ran?
Given that, I wanted to compare how well a standard AI agent could investigate the same bug, given only the bug report and access to Replay recordings of the issue.
Agentic Time-Travel Debugging with Replay MCP
Let’s step back and recap what Replay MCP actually is.
Replay is a time-travel debugger for web apps. The Replay recording browser captures DVR-style recordings of an entire web app. Because we capture the entire browser’s behavior and inputs, we can replay the app’s execution exactly the way it ran during the original recording. Unlike session replay or prod monitoring tools, Replay lets you do time-travel debugging and inspect the app’s behavior at any point in time: see which lines of code executed, evaluate log statements for every time a line ran, view the DOM tree and React component tree at any point in time, and much more. This makes it possible to investigate and solve bugs in ways no other tool can.
Replay DevTools is our debugging UI for humans - browser devtools with time-travel built in.Replay MCP gives agents those same time-travel debugging capabilities. Agents can open a recording and use the MCP tools to investigate the same way a human would: looking at console messages, adding logpoints to evaluate expressions each time a line of code ran, getting screenshots and stack traces, and getting framework-specific insights into libraries like React, Redux, Zustand, and TanStack Query. This means agents can now do the investigation work for you automatically!
Investigation Process and Setup
For this experiment, I used my own personal agent setup: OpenCode 1.4 and Opus 4.6. I have some file search and context management plugins enabled, but otherwise no specific skills or custom behaviors.
In other investigations, I’ve found that the context and investigation prompts have a huge influence on the results: telling an agent how deep to go, giving it directions on scientific method steps, providing context on the available codebase.
I did an initial run to see if the agent could even get close to the correct answer. I was thrilled to see that just by analyzing the Replay recordings, the agent successfully identified the root cause and the fix in under 10 minutes!
That alone is an amazing result. As Dan described, his own agent struggled investigating originally, and only later succeeded once he used Andrew Clark’s hint and had it rebuild React with logging added. In comparison, just having Replay recordings available to investigate was enough for an agent to solve it right away!
With that as a baseline, I set up a proper experiment: how much do prompts and investigation instructions matter? I kicked off four parallel agent investigation sessions. Each agent session was given the same access to the Replay recordings of the bug and a local copy of the demo app source, but with varying prompt instructions:
Light details, bug repro, only told to “write a bug report with a root cause and suggested fix”
Same repro steps, but a detailed 8-step investigation methodology including explaining why the problem is happening
Additional summary of React’s internal scheduling system concepts
Additional list of Replay MCP Tools and their purposes
How would they do? What differences would we see between them in results or investigation times?
Investigation Session Results
I was thrilled to see that with Replay recordings of the bug available, all 4 agent sessions successfully used Replay MCP to nail the actual root cause and produced valid suggested fixes, in under 30 minutes!
Here’s how they tackled the investigation.
Agent #1 (Basic Instructions): 28 Minutes
Agent #1 was given the least context and instructions, so it spent the most time trying to orient itself and understand React’s internals. It also went down the most rabbit holes chasing false leads :)
After starting with the Replay MCP RecordingOverview tool and seeing the prod recording ending with a render commit mismatch and no SuspenseResumed commit, it dug into the implementation of useDeferredValue . This got stuck on promise semantics for several minutes before concluding it was a scheduling problem.
It used the Logpoint tool in pingSuspendedRoot and confirmed there were ping issues. It took another 10 minutes of tracing through RSC promise resolution, including checking call stacks to confirm portions of the call stack were synchronous, before it found the relevant Suspense ternary condition and analyzed that.
It ultimately proposed three fixes: fixing the pingSuspendedRoot ternary (matching PR #36134), making pings async, and updating pinged lanes after renders complete.
This was the longest session, but it did a remarkably thorough job of tracing through the complexity of React’s internals and using Replay MCP Logpoints and other tools to understand what was going on.
Agent #1 prompt
You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.
Bug Report
App: Next.js app using React 19 with server actions. Source at ~/projects/temp/react-udv-bug.
Behavior:
A textarea input feeds text to a server action (renderAction) which returns Promise<ReactNode>
The promise is wrapped in useDeferredValue, then consumed via use() inside <Suspense>
In dev mode (next dev): the deferred output box tracks the input and catches up after typing stops
In prod mode (next build && next start): the deferred output frequently gets stuck and never catches up to the final input value
Recordings:
Dev (working): 06447062-f403-49f3-8bd8-476a65b448d2
In the broken prod recording, the user typed "hello world from prod" but the deferred output is stuck showing "hello world from pro" — one character behind, permanently.
What we know:
No console errors in either recording
The server action POST requests all return 200 OK with valid data
The app code is identical between dev and prod — only the build mode differs
It only manifests in production builds
Goal: Figure out why the deferred value stops updating in production and identify the root cause.
Deliverable
Write a detailed bug report with:
Bug Description — What happens
Root Cause Analysis — What exactly causes the deferred value to get stuck in prod but not dev
Suggested Fix — What code change would fix this
Key Evidence Trail — Table of tools used and findings at each step
Agent #1 final analysis
Bug Report: useDeferredValue Permanently Stalls in React 19 Production with Server Action RSC Payloads
1. Bug Description
When using useDeferredValue with a promise returned from a server action (the useDeferredValue(promise) pattern), the deferred output frequently gets permanently stuck in production builds. The deferred value stops updating and never catches up to the final input.
Repro: A textarea input calls renderAction(value) (a server action returning Promise<ReactNode>), wraps it in useDeferredValue, then consumes via use() inside <Suspense>. In dev mode, the deferred output tracks correctly. In prod, it stalls — e.g., user types "hello world from prod" but deferred output shows "hello world from pro" forever.
Affected version: React 19.3.0-canary (via Next.js 16.1.6 production build)
2. Root Cause Analysis
This is a race condition in React's concurrent renderer where a synchronous ping from an RSC lazy chunk fires during an active render, causing the deferred lane to be permanently lost.
The Bug Mechanism
The failure occurs in pingSuspendedRoot (react-dom-client.production.js L13073-13089) when three conditions converge simultaneously:
root.suspendedLanes = 0 — The lane hasn't been marked as suspended yet because markRootSuspended runs after the render completes, but the ping fires during the render
executionContext & 2 is truthy — The ping fires inside the render context
workInProgressRootExitStatus = 4 (RootSuspended) — Set by renderDidSuspendDelayIfPossible in throwException, beforeattachPingListener is called
workInProgressRoot === root // TRUE&&(workInProgressRootRenderLanes & pingedLanes)// TRUE&&(4=== workInProgressRootExitStatus // TRUE (RootSuspended)?0===(executionContext &2)// FALSE! In render context.&&prepareFreshStack(root,0)// NOT CALLED:(workInProgressRootPingedLanes |= pingedLanes)// NOT REACHED (wrong ternary branch))
The ternary evaluates 4 === workInProgressRootExitStatus as true, taking the true-branch where executionContext & 2 blocks prepareFreshStack. The else-branch (workInProgressRootPingedLanes |= pingedLanes) is never reached. The lane is lost with no recovery path.
How the Synchronous Ping Happens
The key trigger is the RSC (React Server Components) client's ReactPromise.prototype.then implementation (react-server-dom-turbopack-client.browser.production.js:539-563):
When attachPingListener calls .then(pingSuspendedRoot) on an RSC chunk that is already fulfilled (or transitions to fulfilled during initializeModelChunk), the ping callback fires synchronously, while still inside throwAndUnwindWorkLoop → throwException, which is inside renderRootConcurrent with executionContext = 2.
executionContext=2, suspendedLanes=0, pingedLanes=0 → lane 524288 LOST
5191ms
Render completes
markRootSuspended sets suspendedLanes=524288 but pingedLanes=0 — permanently stuck
Why Dev Mode Works
In development builds, the RSC client likely doesn't trigger synchronous resolution in .then(), or React's dev-mode Suspense handling processes thenables differently. The ping always fires asynchronously (when executionContext = 0), so root.suspendedLanes is properly populated by the time the ping runs.
Evidence: Dev recording shows 125 render commits (every keystroke gets both a SetState and SuspenseResumed). Prod shows only 38 commits, with the last being a SetState with no follow-up SuspenseResumed.
3. Suggested Fix
There are two complementary fixes, either of which would prevent this class of bug:
Fix A: Handle synchronous pings during render in pingSuspendedRoot
The ternary at L13078-13088 has a logic gap. When exitStatus === 4 and executionContext & 2, the code should still update workInProgressRootPingedLanes so the render can retry the pinged lanes when it completes:
javascript
// In pingSuspendedRoot, after L13076:if(workInProgressRoot === root){if((workInProgressRootRenderLanes & pingedLanes)=== pingedLanes){if(4=== workInProgressRootExitStatus ||...){if(0===(executionContext &2)){prepareFreshStack(root,0);}else{// NEW: Even if we can't restart now, mark lanes as pinged// so the scheduler picks them up after render completes
workInProgressRootPingedLanes |= pingedLanes;}}else{
workInProgressRootPingedLanes |= pingedLanes;}}}
Fix B: Prevent synchronous ping resolution from RSC chunks
In attachPingListener or in the RSC client's ReactPromise.prototype.then, ensure the ping callback is always invoked asynchronously (e.g., via microtask), so it never fires inside the render context:
javascript
// In attachPingListener, wrap the ping:
wakeable.then(()=>queueMicrotask(()=>pingSuspendedRoot(root, pingedLanes)),()=>queueMicrotask(()=>pingSuspendedRoot(root, pingedLanes)));
Fix C: Also update root.pingedLanes from workInProgressRootPingedLanes after render
In performWorkOnRoot, after renderRootConcurrent returns with exitStatus = 4 (RootSuspended), propagate any in-render pinged lanes to the root before calling markRootSuspended:
javascript
// Before markRootSuspended in the exitStatus=4 path:
root.pingedLanes |= workInProgressRootPingedLanes;
Recommended approach: Fix A is the most targeted — it addresses the exact ternary logic gap without changing ping timing semantics. Fix B is a defense-in-depth measure. Fix C ensures the information isn't lost even if other paths hit the same issue.
4. Key Evidence Trail
Step
Tool
Finding
1
RecordingOverview (both recordings)
Dev: 125 commits, Prod: 38 commits. Same React 19.3.0-canary.
2
ReactRenders commits (prod)
Last commit #37 is SetState (2 fibers), no SuspenseResumed follows. Recording continues 5+ seconds.
3
NetworkRequest detail (prod)
All 21 POSTs return 200 OK. Final POST body "hello world from prod" returns correct data at 5158ms.
4
ReactRenders trigger-detail
Commit #37 triggered by keypress "d". Previous commit #36 was SuspenseResumed from network.
5
SearchSources / ReadSource
Found updateDeferredValueImpl (L5407-5424): sync renders defer, deferred renders accept.
6
Logpoint on updateDeferredValueImpl
Last 8 hits show hit #6 (5150ms) and #8 (5159ms) accepting deferred value — but no render commits follow.
7
DescribePoint at hit #8 (Point:162)
Promise is fulfilled with correct "hello world from prod" ReactNode.
8
SearchSources for pingSuspendedRoot
Found L13073-13089: lane pinged via `root.pingedLanes
9
Logpoint on pingSuspendedRoot L13076
3 pings after final keystroke. Pings 1-2 at 5158ms work (suspendedLanes=524800). Ping 3 at 5169ms fails (suspendedLanes=0, executionContext=2).
10
Logpoint on markRootSuspended L12149
4 hits. Hit 4 at 5191ms: suspendedLanes goes 0→524288 with pingedLanes=0. Permanent stuck state.
11
ReadSource + DescribePoint on throwException L6205-6218
renderDidSuspendDelayIfPossible() called BEFORE attachPingListener() — sets exitStatus=4 before ping can fire.
ReactPromise.prototype.then: if status is "fulfilled", calls resolve(this.value)synchronously.
14
ReadSourceperformWorkOnRoot L11847-11851
exitStatus=0 (yielded): no markRootSuspended called, so root.suspendedLanes stays 0 during re-entry render.
15
Evaluate at Point:182
Confirmed: root.suspendedLanes=0, root.pingedLanes=0, executionContext=2, workInProgressRootExitStatus=4. All conditions for the double failure met.
Summary
This is a React 19 scheduler bug where a synchronous ping from an RSC lazy chunk fires during an active concurrent render, hitting a logic gap in pingSuspendedRoot's ternary condition. The deferred lane is neither pinged on the root nor recorded in workInProgressRootPingedLanes, causing it to be permanently suspended with no recovery mechanism. The bug is prod-only because the RSC client in production resolves chunks synchronously in .then(), while the dev bundle resolves them asynchronously.
Agent #2 (Investigation Methodology): 17 Minutes
Agent #2 also started with RecordingOverview and quickly identified the commit mismatch. It then used Replay MCP’s React render trigger details to trace the causation chain from keystrokes to render commits.
It made its way to pingSuspendedRoot and used the Logpoint tool to check the hits and values inside. It got briefly sidetracked on scheduling behavior before coming back to the rendering logic.
Along the way, it actually identified a second potential bug that none of the other runs found. It found a isThenableResolved(thenable) call that checks to see if a promise is resolved or not, and identified that RSC promises use a different status value, "resolved_model" instead of "fulfilled". It suggested a tweak to this logic to allow resuming synchronously instead of re-throwing.
In the end it also suggested the common “re-ping lanes” fix that the other agents and the PR had, but this was a fascinating insight, and derived solely from inspecting the runtime behavior in this one recording.
Agent #2 prompt
You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.
Bug Report
App: Next.js app using React 19 with server actions. Source at ~/projects/temp/react-udv-bug.
Behavior:
A textarea input feeds text to a server action (renderAction) which returns Promise<ReactNode>
The promise is wrapped in useDeferredValue, then consumed via use() inside <Suspense>
In dev mode (next dev): the deferred output box tracks the input and catches up after typing stops
In prod mode (next build && next start): the deferred output frequently gets stuck and never catches up to the final input value
Recordings:
Dev (working): 06447062-f403-49f3-8bd8-476a65b448d2
In the broken prod recording, the user typed "hello world from prod" but the deferred output is stuck showing "hello world from pro" — one character behind, permanently.
What we know:
No console errors in either recording
The server action POST requests all return 200 OK with valid data
The app code is identical between dev and prod — only the build mode differs
It only manifests in production builds
Goal: Figure out why the deferred value stops updating in production and identify the root cause.
Investigation Methodology
Follow this structured process:
Step 1: Establish the baseline
Get an overview of both recordings. Compare high-level stats: render commit counts, network request counts, timing. The quantitative difference between dev and prod is your first clue.
Step 2: Confirm the visual bug
Take screenshots at the end of both recordings to verify what each shows. Confirm the prod recording is stuck one character behind.
Step 3: Compare render commit patterns
Look at the sequence of React render commits in both recordings. Focus on the tail end — the last few commits. In a working recording, you'd expect the render pattern to complete cleanly. In the broken one, something is missing or different. What's the last commit in each? What triggered it? What components re-rendered?
Step 4: Map the data flow end-to-end
Trace the full lifecycle: keystroke → server action POST → response → promise resolution → deferred value update → component re-render. Verify each step completes in both recordings. If all steps complete but the render doesn't happen, the bug is in React's scheduling, not the data flow.
Step 5: Find the divergence point
The app code is identical. The data flow completes in both. So what's different at the React runtime level? Read the relevant React source code in the recorded bundles. Focus on understanding the algorithm before looking for bugs — understand what SHOULD happen first, then look for where reality diverges.
Step 6: Trace state through the critical path
Once you've identified the relevant scheduling/reconciler functions, trace mutable state values through the critical time window. Focus on the last update that fails in prod — what state does React see? Compare the same state in the dev recording for the equivalent update. The divergence in state values IS the root cause.
Step 7: Explain why dev vs prod
Dev and prod React builds differ: dev is slower (extra validation, warnings, double-renders in StrictMode), prod is faster (optimized, no dev checks). This timing difference can change when asynchronous callbacks (microtasks, promise .then()) fire relative to React's render cycle. If a callback fires at a different point in the render lifecycle between dev and prod, that's likely the mechanism.
Step 8: Red-team your conclusion
Before finalizing: Does your explanation account for WHY it's prod-only? Does your fix address the root cause or just a symptom? Could there be other scenarios that trigger the same bug?
Deliverable
Write a detailed bug report with:
Bug Description — What happens
Root Cause Analysis — What exactly causes the deferred value to get stuck in prod but not dev
Suggested Fix — What code change would fix this
Key Evidence Trail — Table of tools used and findings at each step
Agent #2 final analysis
Bug Report: useDeferredValue Gets Permanently Stuck in Production with RSC Server Actions
1. Bug Description
When using useDeferredValue with a promise returned by a React Server Action inside <Suspense>, the deferred output permanently stops updating in production builds. The last update is lost — the output gets stuck one (or more) characters behind the input and never catches up, even though the server action completes successfully and returns correct data.
Reproduction: Type rapidly into a textarea that triggers a server action on each keystroke. The server action returns Promise<ReactNode>, which is passed through useDeferredValue and consumed via use() inside <Suspense>. In dev mode, output catches up after typing stops. In prod mode, output freezes permanently behind.
2. Root Cause Analysis
This is a race condition between React's Suspense ping mechanism and RSC chunk resolution, exposed only in production due to faster React execution.
The Two Interacting Bugs
Bug A: isThenableResolved doesn't recognize RSC intermediate states
React's isThenableResolved function (react-dom-client.production.js:3394) only checks for status === "fulfilled" or status === "rejected". But RSC flight client chunks use an intermediate "resolved_model" status, meaning HTTP data has arrived but hasn't been fully deserialized yet. When React's renderRootConcurrent resumes a suspended render (case 7: SuspendedOnData) and checks isThenableResolved(thrownValue), it returns false for "resolved_model" chunks and falls through to throwAndUnwindWorkLoop — treating already-available data as still-pending.
Bug B: Synchronous ping fires before lane is marked suspended, creating a dead lane
After throwAndUnwindWorkLoop, React calls attachPingListener which attaches .then(pingSuspendedRoot) to the RSC chunk. But RSC's ReactPromise.prototype.then (line 539) detects the "resolved_model" status, synchronously calls initializeModelChunk() (transitioning to "fulfilled"), and synchronously invokes pingSuspendedRoot — all within the same call stack as attachPingListener.
At this moment, the render is still in progress. root.suspendedLanes is 0 because markRootSuspended hasn't run yet. So pingSuspendedRoot computes:
Later (~26ms), the render completes with exit status 4 (RootSuspended). markRootSuspended finally sets root.suspendedLanes |= 524288. But the ping already fired and won't fire again. The lane is now:
pendingLanes: has bit 19 ✓
suspendedLanes: has bit 19 ✓
pingedLanes: 0 ✗
getNextLanes computes nonIdlePendingLanes & ~suspendedLanes = 0. The lane is permanently stuck — pending but unreachable.
Dev React has extra overhead: StrictMode double-renders, validation checks, debug IO tracking in trackUsedThenable (lines 3838-3854). This extra time creates a window where the RSC chunk transitions from "resolved_model" → "fulfilled" before React's case 7 check runs. So isThenableResolved returns true, replaySuspendedUnitOfWork is called instead of throwAndUnwindWorkLoop, and the value is consumed directly — no ping needed.
3. Suggested Fix
There are two complementary fixes, either of which would prevent the dead lane:
Fix A (React reconciler): Recognize RSC intermediate resolved states
In isThenableResolved, also check for "resolved_model" (and potentially other RSC intermediate resolved states like "resolved_module"):
javascript
// react-reconciler: ReactFiberThenable.jsfunctionisThenableResolved(thenable){var status = thenable.status;return status ==="fulfilled"|| status ==="rejected"|| status ==="resolved_model"|| status ==="resolved_module";}
This way, when case 7 checks the thenable, it recognizes the data has arrived and calls replaySuspendedUnitOfWork instead of throwAndUnwindWorkLoop. The render succeeds without needing a ping at all.
Fix B (React reconciler): Guard against synchronous pings during render
In pingSuspendedRoot, if the lane isn't yet in suspendedLanes (because render is still in progress), defer the ping or ensure the lane gets re-pinged after markRootSuspended:
javascript
// react-reconciler: ReactFiberWorkLoop.jsfunctionmarkRootSuspended(root, suspendedLanes,...){
root.suspendedLanes |= suspendedLanes;
root.pingedLanes &=~suspendedLanes;// NEW: Check if any of these lanes were pinged while render was in progress// If so, immediately re-ping themif(root._pendingPingsDuringRender & suspendedLanes){
root.pingedLanes |= root._pendingPingsDuringRender & suspendedLanes;
root._pendingPingsDuringRender =0;}}
Or, alternatively in pingSuspendedRoot, detect that a render is in progress and record the ping for later application.
Fix A is the cleaner solution — it avoids the ping mechanism entirely for already-resolved data and handles the root cause (misidentification of RSC chunk state). Fix B is a defense-in-depth measure for any other scenario where synchronous pings could race with in-progress renders.
Dev output matches input. Prod stuck at "hello world from pro" — one char behind.
Last render commits
ReactRenders commits mode (prod)
Last commit #37 at 5.15s is a setState for the typing event, not a Suspense resolution. No commit for final value.
Network verification
NetworkRequest detail (prod, last 2 POSTs)
Both return 200 OK. Final POST contains "hello world from prod" — correct data.
Trigger chains
ReactRenders trigger-detail (prod, commits 35-37)
Commit 36 triggered by network response for "hello world from pro". No commit triggered by final response.
Ping analysis
Logpoint on pingSuspendedRoot
3 pings after final keystroke. Ping #3 (Point:114, 5169ms): suspendedLanes=0, ping is a no-op.
Re-suspension
Logpoint on markRootSuspended
Point:118 (5191ms): Lane 524288 re-suspended. pingedLanes already 0.
Scheduling dead-end
Logpoint on scheduleTaskForRootDuringMicrotask
After re-suspension: getNextLanes returns 0. Lane stuck forever.
isThenableResolved
Logpoint on line 3394 + DescribePoint
Returns false for status === "resolved_model". Only checks "fulfilled"/"rejected".
Case 7 path
DescribePoint on renderRootConcurrent case 7
isThenableResolved false → throwAndUnwindWorkLoop instead of replaySuspendedUnitOfWork.
RSC chunk .then()
ReadSource on RSC client code
ReactPromise.prototype.then synchronously resolves "resolved_model" → "fulfilled" and invokes callback inline.
Dev vs prod timing
DescribePoint comparison of trackUsedThenable
Dev has extra debug IO code (lines 3838-3854) creating enough delay for chunk to reach "fulfilled" before case 7 check.
5. Impact Assessment
Severity: High — causes permanent UI staleness with no recovery
Scope: Any Next.js app using useDeferredValue + use() + RSC server actions with rapid updates
Workarounds: None clean. Adding artificial delays would mask but not fix. Avoiding useDeferredValue with server actions works but defeats the purpose.
React version affected: 19.3.0-canary-f93b9fd4-20251217 (and likely all React 19 builds with RSC support)
Agent #3 (React Scheduling Concepts): 8 Minutes
Agent #3 started with RecordingOverview to check the output, and Screenshot to confirm the problem visually. Since it already had the terminology to describe React’s scheduling internals, it made a beeline to functions like markRootSuspended, even without any actual knowledge of the implementation.
It used the Logpoint tool to check hits for pingSuspendedRoot and markRootSuspended, and quickly identified the relevant ternary logic as the culprit. It then checked vs the dev recording line hits, and came up with the pinged lanes solution, without any major rabbit trails.
Agent #3 prompt
You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.
Bug Report
App: Next.js app using React 19 with server actions. Source at ~/projects/temp/react-udv-bug.
Behavior:
A textarea input feeds text to a server action (renderAction) which returns Promise<ReactNode>
The promise is wrapped in useDeferredValue, then consumed via use() inside <Suspense>
In dev mode (next dev): the deferred output box tracks the input and catches up after typing stops
In prod mode (next build && next start): the deferred output frequently gets stuck and never catches up to the final input value
Recordings:
Dev (working): 06447062-f403-49f3-8bd8-476a65b448d2
In the broken prod recording, the user typed "hello world from prod" but the deferred output is stuck showing "hello world from pro" — one character behind, permanently.
What we know:
No console errors in either recording
The server action POST requests all return 200 OK with valid data
The app code is identical between dev and prod — only the build mode differs
It only manifests in production builds
Goal: Figure out why the deferred value stops updating in production and identify the root cause.
Investigation Methodology
Follow this structured process:
Step 1: Establish the baseline
Get an overview of both recordings. Compare high-level stats: render commit counts, network request counts, timing. The quantitative difference between dev and prod is your first clue.
Step 2: Confirm the visual bug
Take screenshots at the end of both recordings to verify what each shows. Confirm the prod recording is stuck one character behind.
Step 3: Compare render commit patterns
Look at the sequence of React render commits in both recordings. Focus on the tail end — the last few commits. In a working recording, you'd expect the render pattern to complete cleanly. In the broken one, something is missing or different. What's the last commit in each? What triggered it? What components re-rendered?
Step 4: Map the data flow end-to-end
Trace the full lifecycle: keystroke → server action POST → response → promise resolution → deferred value update → component re-render. Verify each step completes in both recordings. If all steps complete but the render doesn't happen, the bug is in React's scheduling, not the data flow.
Step 5: Find the divergence point
The app code is identical. The data flow completes in both. So what's different at the React runtime level? Read the relevant React source code in the recorded bundles. Focus on understanding the algorithm before looking for bugs — understand what SHOULD happen first, then look for where reality diverges.
Step 6: Trace state through the critical path
Once you've identified the relevant scheduling/reconciler functions, trace mutable state values through the critical time window. Focus on the last update that fails in prod — what state does React see? Compare the same state in the dev recording for the equivalent update. The divergence in state values IS the root cause.
Step 7: Explain why dev vs prod
Dev and prod React builds differ: dev is slower (extra validation, warnings, double-renders in StrictMode), prod is faster (optimized, no dev checks). This timing difference can change when asynchronous callbacks (microtasks, promise .then()) fire relative to React's render cycle. If a callback fires at a different point in the render lifecycle between dev and prod, that's likely the mechanism.
Step 8: Red-team your conclusion
Before finalizing: Does your explanation account for WHY it's prod-only? Does your fix address the root cause or just a symptom? Could there be other scenarios that trigger the same bug?
Conceptual Context: React's Scheduling System
These concepts will help you navigate the React internals you'll encounter:
Lanes: React uses a bitmask-based priority system called "lanes." Each update is assigned a lane (e.g., SyncLane, TransitionLane, DeferredLane). useDeferredValue schedules work on deferred lanes, which are lower priority than user-initiated updates.
Suspension and pinging: When a component throws a Promise (Suspense), React marks the update's lanes as "suspended." When the promise resolves, a "ping" fires that marks those lanes as "pinged," telling the scheduler to retry. The lifecycle is: pending → rendering → suspended → pinged → retry rendering → committed.
workInProgress state: During rendering, React tracks the current render's state in module-scoped variables prefixed with workInProgress — things like what root is being rendered, what lanes, what the exit status is, which lanes have been pinged during this render. These are mutable and change throughout the render cycle.
markRootSuspended: When a render suspends, this function marks the lanes as suspended on the root. It uses workInProgressRootPingedLanes to exclude lanes that were pinged during the current render (so they don't get stuck as suspended). It also adds lanes to warmLanes which affects future scheduling.
Concurrent rendering and yielding: In prod, React uses time-slicing — renders can yield back to the browser mid-work and resume later. This means microtasks (like promise .then() callbacks) can fire between render chunks. In dev mode, renders tend to run longer without yielding due to extra dev checks, so microtasks fire after the render completes.
The key timing question: When a suspended promise resolves, the .then() ping callback might fire during an active render (prod, due to yielding) or after the render completes (dev, due to slower execution). If the ping handling code doesn't correctly handle the "ping arrives mid-render" case, the lane can get stuck.
Deliverable
Write a detailed bug report with:
Bug Description — What happens
Root Cause Analysis — What exactly causes the deferred value to get stuck in prod but not dev
Suggested Fix — What code change would fix this
Key Evidence Trail — Table of tools used and findings at each step
Agent #3 final analysis
Bug Report: useDeferredValue Permanently Stalls in Production React Builds
1. Bug Description
When using useDeferredValue with a server action that returns Promise<ReactNode>, wrapped in <Suspense> and consumed via use(), the deferred output frequently stops updating in production builds. The user types "hello world from prod" but the deferred display is stuck showing "hello world from pro" — one character behind, permanently. The same app code works correctly in dev mode.
2. Root Cause Analysis
The bug is in React's pingSuspendedRoot function — specifically, a code path that silently drops a Suspense ping when it arrives during an active concurrent render with RootSuspendedWithDelay exit status.
The mechanism step by step:
1. User types the final 'd' in "prod" (5149ms). React:
Schedules a new server action POST for "hello world from prod"
Begins a concurrent render on lane 524288 (DeferredLane)
2. The server action responds quickly (5155ms, ~6ms round-trip). The promise's .then() callback fires pingSuspendedRoot at 5169ms — while React is still mid-render of the same lane due to concurrent time-slicing.
3. Inside pingSuspendedRoot (line 13073), the ping enters the critical branch:
javascript
functionpingSuspendedRoot(root, wakeable, pingedLanes){// Line 13076: Try to mark root.pingedLanes
root.pingedLanes |= root.suspendedLanes & pingedLanes;// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^// BUG: root.suspendedLanes is STILL 0 here!// The lane hasn't been marked suspended yet (render is still in progress).// So this is: 0 |= (0 & 524288) = 0. NO EFFECT.
root.warmLanes &=~pingedLanes;// warmLanes also 0 at this point// Line 13078: Are we currently rendering this root+lanes?
workInProgressRoot === root &&// TRUE (same root)(workInProgressRootRenderLanes & pingedLanes)=== pingedLanes &&// TRUE (same lane)// Line 13080: Ternary — exitStatus check(4=== workInProgressRootExitStatus ||// TRUE (RootSuspendedWithDelay = 4)(...))// FIRST BRANCH taken:?0===(executionContext &2)&&prepareFreshStack(root,0)// ^^^^^^^^^^^^^^^^^^^^^^// BUG: executionContext IS 2 (RenderContext)! We're mid-render!// So 0 === (2 & 2) is FALSE. prepareFreshStack is NOT called.// This entire branch is a no-op.// ELSE BRANCH (never reached because ternary was truthy)::(workInProgressRootPingedLanes |= pingedLanes)// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^// This would have saved us! But it's unreachable.ensureRootIsScheduled(root);// Called, but getNextLanes returns 0}
4. The render completes and calls markRootSuspended (line 12141):
javascript
functionmarkRootSuspended(root, suspendedLanes, spawnedLane, didAttemptEntireTree){
suspendedLanes &=~workInProgressRootPingedLanes;// 524288 &= ~0 = 524288 (no exclusion)
root.suspendedLanes |= suspendedLanes;// Lane 524288 marked SUSPENDED
root.pingedLanes &=~suspendedLanes;// Clears any pings (already 0)
didAttemptEntireTree &&(root.warmLanes |= suspendedLanes);// Lane marked WARM}
5. The lane is now permanently stuck:
root.pendingLanes = 524288 ✓ (work pending)
root.suspendedLanes = 524288 ✗ (excluded from scheduling)
root.pingedLanes = 0 ✗ (no ping recorded to rescue it)
root.warmLanes = 524288 ✗ (excluded from cold-retry path)
6. getNextLanes (line 706) returns 0 — all three rescue paths fail:
The scheduler permanently sees no work. The UI is stuck.
Why dev mode works
In development mode, React runs significantly slower due to:
Extra validation checks, warning infrastructure, DevTools hooks
StrictMode double-rendering
This means renders take longer and never yield to the browser mid-render. Promise .then() microtask callbacks therefore always fire after the render completes. In every single ping across 126 hits in the dev recording, workInProgressRoot !== root and executionContext === 0 — the ping always takes the fast path of just setting root.pingedLanes (line 13076, where root.suspendedLanes is already set) and calling ensureRootIsScheduled. The buggy ternary at line 13078 is never entered.
In production, concurrent time-slicing allows yields mid-render, so microtasks from resolved promises can fire during the render. This creates the timing window where the ping arrives before markRootSuspended runs.
3. Suggested Fix
The fix is in pingSuspendedRoot. When the ping fires mid-render (executionContext has RenderContext bit set) and the exit status prevents prepareFreshStack, the function must still record the ping in workInProgressRootPingedLanes so that markRootSuspended can exclude the lane from suspension.
Option A (minimal fix): When the prepareFreshStack guard fails, fall through to set workInProgressRootPingedLanes:
javascript
// In pingSuspendedRoot, replace the ternary at lines 13080-13086:// BEFORE (buggy):(4=== workInProgressRootExitStatus ||(...))?0===(executionContext &2)&&prepareFreshStack(root,0):(workInProgressRootPingedLanes |= pingedLanes)// AFTER (fixed):if(4=== workInProgressRootExitStatus ||(...)){if(0===(executionContext &2)){prepareFreshStack(root,0);}else{// Ping arrived mid-render. Can't restart now, but record the ping// so markRootSuspended won't mark this lane as suspended.
workInProgressRootPingedLanes |= pingedLanes;}}else{
workInProgressRootPingedLanes |= pingedLanes;}
Option B (defensive): Additionally ensure root.pingedLanes is set even when the lane isn't yet suspended, so the ping survives regardless of ordering:
javascript
// At the top of pingSuspendedRoot, unconditionally record the ping on the root// for lanes that are pending (not just suspended):
root.pingedLanes |= root.suspendedLanes & pingedLanes;
root.pingedLanes |= root.pendingLanes & pingedLanes;// NEW: catch pre-suspension pings
Option A is the more surgical fix addressing the exact logic error. Option B is belt-and-suspenders.
Dev shows "hello world from prod" in output. Prod shows "hello world from pro" — stuck.
Render commit tails
ReactRenders commit mode (both)
Dev ends with SuspenseResumed at 8.47s. Prod ends with SetState at 5.15s — no final SuspenseResumed.
Network completion
NetworkRequest detail (prod #36)
Final POST "hello world from prod" returns 200 OK at 5155ms with valid RSC payload. Data arrived.
App source code
grepika_get on app/client.tsx
useDeferredValue(promise) → use() in <Suspense>. Standard pattern.
pingSuspendedRoot analysis
ReadSource + Logpoint on line 13073
46 total hits. Hit #5 (Point:114, 5169ms) fires mid-render: executionContext=2, root.suspendedLanes=0, workInProgressRootExitStatus=4. Ping silently dropped.
markRootSuspended aftermath
Logpoint on line 12147 + post-state eval
After line 12161: root.suspendedLanes=524288, root.pingedLanes=0, root.warmLanes=524288. Lane permanently stuck.
getNextLanes deadlock
ReadSource on line 706
Three rescue paths all return 0 with stuck state. No work scheduled.
Dev comparison
Logpoint on line 13073 (dev recording)
All 126 pings: executionContext=0, workInProgressRoot !== root. Buggy branch never entered.
markRootSuspended exclusion logic
ReadSource on line 12147
suspendedLanes &= ~workInProgressRootPingedLanes — would exclude pinged lanes, but workInProgressRootPingedLanes=0 because the ping was dropped.
5. Summary
This is a race condition in React's concurrent rendering scheduler. When a Suspense promise resolves during an active concurrent render (via microtask firing during a yield), pingSuspendedRoot enters a code path where:
root.pingedLanes can't be set (lane not yet suspended on root)
prepareFreshStack can't be called (currently in render context)
workInProgressRootPingedLanes isn't set (unreachable else-branch of ternary)
The ping is completely lost. The subsequent markRootSuspended marks the lane as suspended with no ping, and getNextLanes can never find work for it again. The fix is to ensure workInProgressRootPingedLanes is set in the mid-render case, so markRootSuspended excludes the lane from suspension.
Agent #4 started off the same way as #3, with RecordingOverview and Screenshot tool calls. After scanning the source for pingSuspendedRoot in the React bundle, it tried to use Logpoint calls but struggled a bit with the syntax, so it switched to Evaluate instead to similarly retrieve real values in scope at various execution points.
It found the relevant ternary logic in 4 minutes, confirmed the issue and compared with the dev build, did some red-team review, and wrote up the final report with the correct pinged lanes solution.
This run was the most efficient - it read the source, knew where to instrument, and produced the best final report .
Agent #4 prompt
You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.
Bug Report
App: Next.js app using React 19 with server actions. Source at ~/projects/temp/react-udv-bug.
Behavior:
A textarea input feeds text to a server action (renderAction) which returns Promise<ReactNode>
The promise is wrapped in useDeferredValue, then consumed via use() inside <Suspense>
In dev mode (next dev): the deferred output box tracks the input and catches up after typing stops
In prod mode (next build && next start): the deferred output frequently gets stuck and never catches up to the final input value
Recordings:
Dev (working): 06447062-f403-49f3-8bd8-476a65b448d2
In the broken prod recording, the user typed "hello world from prod" but the deferred output is stuck showing "hello world from pro" — one character behind, permanently.
What we know:
No console errors in either recording
The server action POST requests all return 200 OK with valid data
The app code is identical between dev and prod — only the build mode differs
It only manifests in production builds
Goal: Figure out why the deferred value stops updating in production and identify the root cause.
Investigation Methodology
Follow this structured process:
Step 1: Establish the baseline
Get an overview of both recordings. Compare high-level stats: render commit counts, network request counts, timing. The quantitative difference between dev and prod is your first clue.
Step 2: Confirm the visual bug
Take screenshots at the end of both recordings to verify what each shows. Confirm the prod recording is stuck one character behind.
Step 3: Compare render commit patterns
Look at the sequence of React render commits in both recordings. Focus on the tail end — the last few commits. In a working recording, you'd expect the render pattern to complete cleanly. In the broken one, something is missing or different. What's the last commit in each? What triggered it? What components re-rendered?
Step 4: Map the data flow end-to-end
Trace the full lifecycle: keystroke → server action POST → response → promise resolution → deferred value update → component re-render. Verify each step completes in both recordings. If all steps complete but the render doesn't happen, the bug is in React's scheduling, not the data flow.
Step 5: Find the divergence point
The app code is identical. The data flow completes in both. So what's different at the React runtime level? Read the relevant React source code in the recorded bundles. Focus on understanding the algorithm before looking for bugs — understand what SHOULD happen first, then look for where reality diverges.
Step 6: Trace state through the critical path
Once you've identified the relevant scheduling/reconciler functions, trace mutable state values through the critical time window. Focus on the last update that fails in prod — what state does React see? Compare the same state in the dev recording for the equivalent update. The divergence in state values IS the root cause.
Step 7: Explain why dev vs prod
Dev and prod React builds differ: dev is slower (extra validation, warnings, double-renders in StrictMode), prod is faster (optimized, no dev checks). This timing difference can change when asynchronous callbacks (microtasks, promise .then()) fire relative to React's render cycle. If a callback fires at a different point in the render lifecycle between dev and prod, that's likely the mechanism.
Step 8: Red-team your conclusion
Before finalizing: Does your explanation account for WHY it's prod-only? Does your fix address the root cause or just a symptom? Could there be other scenarios that trigger the same bug?
Conceptual Context: React's Scheduling System
These concepts will help you navigate the React internals you'll encounter:
Lanes: React uses a bitmask-based priority system called "lanes." Each update is assigned a lane (e.g., SyncLane, TransitionLane, DeferredLane). useDeferredValue schedules work on deferred lanes, which are lower priority than user-initiated updates.
Suspension and pinging: When a component throws a Promise (Suspense), React marks the update's lanes as "suspended." When the promise resolves, a "ping" fires that marks those lanes as "pinged," telling the scheduler to retry. The lifecycle is: pending → rendering → suspended → pinged → retry rendering → committed.
workInProgress state: During rendering, React tracks the current render's state in module-scoped variables prefixed with workInProgress — things like what root is being rendered, what lanes, what the exit status is, which lanes have been pinged during this render. These are mutable and change throughout the render cycle.
markRootSuspended: When a render suspends, this function marks the lanes as suspended on the root. It uses workInProgressRootPingedLanes to exclude lanes that were pinged during the current render (so they don't get stuck as suspended). It also adds lanes to warmLanes which affects future scheduling.
Concurrent rendering and yielding: In prod, React uses time-slicing — renders can yield back to the browser mid-work and resume later. This means microtasks (like promise .then() callbacks) can fire between render chunks. In dev mode, renders tend to run longer without yielding due to extra dev checks, so microtasks fire after the render completes.
The key timing question: When a suspended promise resolves, the .then() ping callback might fire during an active render (prod, due to yielding) or after the render completes (dev, due to slower execution). If the ping handling code doesn't correctly handle the "ping arrives mid-render" case, the lane can get stuck.
Replay MCP Tool Strategy
You have access to powerful runtime inspection tools. Here's how to use them effectively for this investigation:
RecordingOverview — Start here for each recording. Gets you error counts, render commit stats, network summary, environment info. Run on BOTH recordings first.
ReactRenders — Your primary comparison tool. Use mode: commits to see the full commit sequence. Use mode: commit with commitIndex to drill into individual commits. Focus on the last few commits in each recording — where does prod diverge from dev?
NetworkRequest — Verify all server action POSTs returned successfully. Use to rule out data-level issues early. Check timing of responses relative to render commits.
SearchSources — Find functions in the recorded bundles by name. React 19 ships unminified production bundles, so function names are preserved. Search for scheduling and reconciler functions to find their locations.
ReadSource — Read source code with execution hit counts per line. This is a built-in static profiler — you can see which branches executed and how many times. Use this to understand code flow without needing to instrument anything.
Logpoint — Your most powerful investigation tool. Evaluates any JavaScript expression at a specific source line, every time that line executes. Returns the expression value at each hit with timestamps. This is equivalent to adding console.log statements retroactively — unlimited, no rebuild, no reproduction needed.
Use beginPoint/endPoint to filter hits to a specific time window (critical for high-frequency functions)
Build complex string expressions to capture multiple values in one logpoint: `lanes=${lanes} status=${status} pinged=${root.pingedLanes}`
Place logpoints at key decision points in scheduling functions to trace mutable state over time
DescribePoint / Evaluate — Inspect full program state at a specific execution point. Use after Logpoint identifies an interesting moment — drill in to see all variables, the call stack, and dependency chains.
Screenshot — Confirm visual state at timestamps. Use at the end of each recording to verify the bug visually.
Investigation pattern: Overview both recordings → Compare render commits → Read source to understand the algorithm → Instrument with Logpoints to trace state through the critical window → Find where dev and prod state diverges → Explain the mechanism.
Deliverable
Write a detailed bug report with:
Bug Description — What happens
Root Cause Analysis — What exactly causes the deferred value to get stuck in prod but not dev
Suggested Fix — What code change would fix this
Key Evidence Trail — Table of tools used and findings at each step
Agent #4 final analysis
Bug Report: React useDeferredValue Stuck in Production — Race Condition in pingSuspendedRoot
1. Bug Description
When using useDeferredValue with a Promise<ReactNode> (fed by a server action) inside <Suspense>, the deferred output permanently stops updating in production builds. The final typed character never renders — the deferred value gets stuck one character behind the input. This does not happen in dev mode.
In the investigated case, the user typed "hello world from prod" but the output is permanently stuck showing "hello world from pro".
2. Root Cause Analysis
The bug is a race condition in React's pingSuspendedRoot function in react-dom-client.production.js. It only manifests in production because production builds use concurrent time-slicing, which allows microtasks (promise .then() callbacks) to fire during an active render.
The Flow That Breaks
User types the final character ('d'). React starts rendering the deferred lane (lane 524288 = 0x80000).
During rendering, the component suspends (throws a Promise) because the server action response hasn't arrived yet. React sets workInProgressRootExitStatus = 4 (RootSuspended) but continues in the render context (executionContext = 2).
The server action response arrives at ~5155ms. The promise resolves, firing the .then() ping callback at 5169ms — while the render is still active.
pingSuspendedRoot fires mid-render with this state:
root.suspendedLanes === 0 (lane hasn't been marked suspended on root yet — that happens later in markRootSuspended)
Three things fail simultaneously:
root.pingedLanes not set: Line 13076 does root.pingedLanes |= root.suspendedLanes & pingedLanes, but root.suspendedLanes is still 0 (not yet marked), so this is a no-op.
prepareFreshStack blocked: The ternary at lines 13080-13086 takes the ? branch (because exitStatus === 4), but prepareFreshStack is guarded by 0 === (executionContext & 2) which is false (mid-render). Result: the expression evaluates to false and nothing happens.
workInProgressRootPingedLanes not set: The : branch (workInProgressRootPingedLanes |= pingedLanes) is only reached when exitStatus is NOT 4 or 3+fallback. Since exitStatus IS 4, we're on the ? branch, and this line never executes.
5191ms — markRootSuspended runs after the render completes:
suspendedLanes = 524288
workInProgressRootPingedLanes = 0 (the ping was lost!)
Line 12147: suspendedLanes &= ~0 → 524288 (no exclusion)
Line 12149: root.suspendedLanes |= 524288 → lane marked suspended
Line 12150: root.pingedLanes &= ~524288 → any ping bit cleared
Line 12151: root.warmLanes |= 524288 → lane added to warm lanes
Result: Lane 524288 is permanently suspended and warm, with no ping to trigger a retry.
Why Dev Mode Works
In dev mode, React is slower due to extra validation, warnings, and StrictMode double-renders. It doesn't yield to the browser mid-render as often. The Replay recording confirms this: the condition workInProgressRoot === root inside pingSuspendedRoot has 0 hits in dev. The ping .then() callback always fires after the render completes, so pingSuspendedRoot takes the early-exit path (no active render) and ensureRootIsScheduled correctly reschedules the retry.
3. The Exact Bug in Code
File:packages/react-reconciler/src/ReactFiberWorkLoop.js (in the React repo)
// Current code (simplified):if(workInProgressRoot === root &&(workInProgressRootRenderLanes & pingedLanes)=== pingedLanes){if(exitStatus === RootSuspended ||(exitStatus === RootSuspendedWithDelay &&...)){// TRUE branch: try to restartif((executionContext & RenderContext)===0){prepareFreshStack(root,0);// Can't do this mid-render!}// If mid-render, NOTHING happens — ping is silently dropped}else{// FALSE branch: record the ping for later
workInProgressRootPingedLanes |= pingedLanes;// Never reached when exitStatus===4}}
The bug: When exitStatus === 4 (suspended) AND executionContext has the render bit set (mid-render), neither prepareFreshStack nor workInProgressRootPingedLanes |= pingedLanes executes. The ping is completely lost.
4. Suggested Fix
When the render has suspended and we're mid-render (can't restart), we should still record the ping in workInProgressRootPingedLanes so that markRootSuspended will exclude the pinged lane from being marked suspended:
javascript
// In pingSuspendedRoot, the ternary handling for exitStatus === 4:if(exitStatus === RootSuspended ||(exitStatus === RootSuspendedWithDelay &&...)){if((executionContext & RenderContext)===0){prepareFreshStack(root,0);}else{// FIX: Record the ping even when we can't restart mid-render.// markRootSuspended will check workInProgressRootPingedLanes// and exclude this lane from being marked suspended.
workInProgressRootPingedLanes |= pingedLanes;}}else{
workInProgressRootPingedLanes |= pingedLanes;}
This ensures that when markRootSuspended runs at line 12147 (suspendedLanes &= ~workInProgressRootPingedLanes), the pinged lane is excluded and won't be permanently stuck. The subsequent ensureRootIsScheduled (line 13089, which always runs) will then correctly schedule a retry render.
Dev shows "hello world from dev" (complete). Prod shows "hello world from pro" (stuck).
3
ReactRenders (commits)
Prod
Last commit #37 is SetState for final keystroke. No SuspenseResumed follows — missing final commit.
4
ReactRenders (commits)
Dev
Last commits #123→#124 show SetState→SuspenseResumed. Pattern completes.
5
NetworkRequest (detail)
Prod
POST #36 at 5149ms sends "hello world from prod", gets 200 OK. Response arrives at 5155ms with valid RSC data. Data is fine.
6
SearchSources + ReadSource
Prod
Located pingSuspendedRoot (L13073), markRootSuspended (L12141), prepareFreshStack (L12184).
7
Logpoint + Evaluate
Prod
At Point:114 (5169ms ping): exitStatus=4, executionContext=2, root.suspendedLanes=0, workInProgressRootPingedLanes=0. Confirmed ping is lost.
8
Logpoint + Evaluate
Prod
At Point:118 (5191ms markRootSuspended): workInProgressRootPingedLanes=0, lane 524288 permanently marked suspended+warm.
9
ReadSource (hit counts)
Dev
pingSuspendedRoot L10047: workInProgressRoot === root has 0 hits. Dev pings never fire mid-render. Confirms timing difference is the mechanism.
10
ReadSource (hit counts)
Prod
pingSuspendedRoot L13087: 4 hits on the mid-render path. L13085 (prepareFreshStack): 0 hits. Confirmed: entered the ternary 4 times but prepareFreshStack never executed.
6. Red-Team Analysis
Does this explain why it's prod-only? Yes. The race requires time-slicing (concurrent rendering with yields), which only happens in production. Dev mode's extra overhead means renders don't yield, so promise callbacks fire after renders complete.
Is this a React bug or an app bug? React bug. The app correctly uses useDeferredValue + Suspense + server actions. The race is entirely within React's scheduler.
Could this affect other patterns? Yes. Any pattern where a promise resolves during an active concurrent render of the same lane — useDeferredValue, lazy(), data fetching with Suspense — could theoretically hit this. The useDeferredValue + server action pattern is particularly susceptible because server action responses arrive asynchronously at unpredictable times.
Is the fix complete? The suggested fix handles the specific case where exitStatus === 4 && (executionContext & 2) !== 0. An alternative fix could be in markRootSuspended itself (checking root.pingedLanes as a fallback), but fixing pingSuspendedRoot at the point where the ping is dropped is cleaner and more direct.
Analyzing the Agent Results
I went into this hoping that having Replay recordings available would help prove that Replay runtime data makes it easier to solve hard bugs. I’ve certainly experienced that myself just working on Replay and using it over the last few years! So it was extremely satisfying to see that every single one of the agent runs was able to find this complex React bug and propose the correct fix, based only on the Replay recordings of the bug!
As Dan noted: his own agent wasn’t able to solve the issue itself. It wasn’t until Andrew Clark pointed him in the right direction that his agent was able to add the right log calls to React’s source, rebuild, analyze the logs, repeat the process, and eventually converge on the right answer.
Replay MCP’s tools gave my agents the ability to analyze the runtime behavior without having to keep rebuilding React! The Logpoint tool allowed agents to dynamically evaluate an expression every time a line of code ran, while the Sources tool and its built-in “hit counters per line” values acted as an impromptu profiler to help guide the investigation.
Comparing Agent Behaviors
Looking at the prompts and the results, I see a few key differences:
Dimension
Run 1 (Baseline)
Run 2 (+Method)
Run 3 (+Context)
Run 4 (+Tools)
Duration
~28 min
~17 min
~8 min
~7 min
Messages
154
111
47
42
Context compressions
6
4
2
2
Found ternary bug
✅
✅
✅
✅
Fix matches PR #36134
✅ (Fix A of 3)
Partial (Fix B)
✅
✅
Found isThenableResolved gap
❌
✅ (unique)
❌
❌
Proved sync call chain
✅ (GetStack)
✅ (ReadSource)
❌
❌
Quantitative hit counts
✅
✅
✅ (best)
✅
Dev vs prod comparison
✅
✅
✅
✅
False lead time
~8 min
~4 min
~1 min
<1 min
Red-team analysis
Moderate
Moderate
Light
Best
Number of fixes proposed
3
2
1
1
Bottom-up exploration
Extensive
Moderate
Minimal
Minimal
Even with the most basic instructions, just having the recording and MCP tools was enough to let Agent #1 eventually figure out the root cause and propose a valid solution for an otherwise unsolvable bug.
We’ve all seen this over the last year, but it’s still incredible to me that an AI can dive into a codebase or problem space and orient itself just by reading some files and produce real value.
Prompting and context are still critical
Just giving a more detailed step-by-step investigation pattern cut the investigation time in half. A few paragraphs of “here’s some concepts of what React’s scheduling internals involve” cut it in half again. And as I’ve seen in some other investigations I’ll talk about in a future post, having some relevant skills files available can produce drastically better investigation results.
Dan said his agent “instrumented the React codebase with logs”, and really needed “information over time”. That’s exactly what Replay MCP provided! Tools like RecordingOverview and ReactRenders to surface info on React behavior and error messages; Logpoint and Evaluate to extract specific values at various points in time; Sources to view source files in the bundle and see the hit counts to understand execution; Screenshot to visually inspect the UI at a given point in time; NetworkRequests to check the requests and results; and even more niche tools like GetStack and DescribePoint to inspect the JS execution flow.
Replay MCP provides all these and more, enabling agents to actually understand the runtime behavior over time, without having to rebuild the app with more logging. Capture the recording once with all the tricky timing behavior, investigate as deeply as needed, automatically.
What do Replay MCP’s tools look like?
Here’s some of the actual MCP tool calls the agents ran during this investigation:
RecordingOverview
javascript
# Recording Overview
## Recording Info & Environment
-**URL:** http://localhost:3000/-**Duration:**10.3s
-**React:** v19.3.0-canary-f93b9fd4-20251217(production)-**Libraries:** NextJS, React
## Console Errors & Warnings
App healthy — no console errors or warnings detected.
## Screenshot
**Final recording state** — screenshot available at timestamp 9955ms.
Use the **Screenshot** tool with`timestamp: 9955` to view it.
## Network Summary
-**Total requests:**37-**Failed:**1-**Slow:**0
### API Patterns
-`POST /` — 21 requests
-`GET /` — 1 request
## React Render Summary
-**Total commits:**38-**Total render time:** 107ms
### Costliest Commits
- Commit #24: 54ms,4 fibers — SuspenseResumed
- Commit #13: 4ms,4 fibers — SuspenseResumed
- Commit #22: 4ms,4 fibers — SuspenseResumed
- Commit #19: 4ms,4 fibers — SuspenseResumed
- Commit #9: 4ms,4 fibers — SuspenseResumed
## Suggested Next Steps
1.**NetworkRequest** — Investigate failed network requests
_1 failed request_
2.**ReactRenders** — Analyze React render performance
_38 commits, 107ms total render time_
3.**Screenshot** — View screenshot of the application state
_Final recording state at 9955ms_
ReactRenders - List
javascript
# Commits
| # | Time | Duration | Fibers | Mutating | Wasted | Waste%| Trigger ||-:|----:|-------:|-----:|-------:|-----:|-----:|---------------||28|4.52s |3.3ms |4|1|3|75.0%| SuspenseResumed ||29|4.56s |15µs |2|1|1|50.0%| SetState ||30|4.66s |1.6ms |3|1|2|66.7%| SetState ||31|4.67s |3.3ms |4|1|3|75.0%| SuspenseResumed ||32|4.76s |4µs |2|1|1|50.0%| SetState ||33|4.77s |3.3ms |4|1|3|75.0%| SuspenseResumed ||34|4.83s |4µs |2|1|1|50.0%| SetState ||35|5.07s |3.3ms |4|1|3|75.0%| SetState ||36|5.08s |3.3ms |4|1|3|75.0%| SuspenseResumed ||37|5.15s |4µs |2|1|1|50.0%| SetState |
Showing 29–38of38
## Next Steps
- Use mode=commit&commitIndex=N to inspect a specific commit
- Use mode=summary to see the recording overview
ReactRenders - Commit
javascript
# Commit #37**Timestamp:**5.15s
**Render duration:**4µs
**Fibers rendered:**2(1 mutating,1 wasted — 50.0%)
## Triggers
-**SetState**- Source:TestPreviewClient(client.tsx:38:25)- Point:11682667932122379955827658622763010
## Components
| Component | Instances | Renders | Wasted | Duration |DOM Muts ||-----------------|--------:|------:|-----:|-------:|-------:|| Resolved |1|1|1|4µs |0|| TestPreviewClient |1|1|0|0µs |1|
## DOMMutations(1 total)- TestPreviewClient:1
## Next Steps
- Use mode=component&componentName=X to investigate a specific component
- Use mode=commits to browse other commits
- Use mode=summary to see the recording overview
Found 46 total hits at turbopack:///[project]/node_modules/.pnpm/next@16.1.6_react-dom@19.2.4_react@19.2.4__react@19.2.4/node_modules/next/dist/compiled/react-dom/cjs/react-dom-client.production.js:13076 (showing first 20)
Hit 1 at point Point:239 @ 2676.17ms:Expression:["ping", pingedLanes,"suspended", root.suspendedLanes,"renderLanes", workInProgressRootRenderLanes,"exit", workInProgressRootExitStatus,"isRoot", workInProgressRoot === root,"execCtx", executionContext]Result:["ping",524288,"suspended",524800,"renderLanes",0,"exit",4,"isRoot",false,"execCtx",0]
Hit 2 at point Point:240 @ 2676.21ms:Expression:["ping", pingedLanes,"suspended", root.suspendedLanes,"renderLanes", workInProgressRootRenderLanes,"exit", workInProgressRootExitStatus,"isRoot", workInProgressRoot === root,"execCtx", executionContext]Result:["ping",512,"suspended",524800,"renderLanes",0,"exit",4,"isRoot",false,"execCtx",0]
Hit 3 at point Point:241 @ 2841.09ms:Expression:["ping", pingedLanes,"suspended", root.suspendedLanes,"renderLanes", workInProgressRootRenderLanes,"exit", workInProgressRootExitStatus,"isRoot", workInProgressRoot === root,"execCtx", executionContext]Result:["ping",1048576,"suspended",1049600,"renderLanes",0,"exit",4,"isRoot",false,"execCtx",0]// snip
Replay MCP: The Time-Travel Superpower Your Agents Need
I joined Replay because I saw the potential and promise of time-travel debugging and how it can make the debugging process drastically faster and better. It enables solving bugs that are otherwise impossible, and provides insights into real runtime behavior that no other monitoring or analysis tool can provide.
Replay DevTools gave humans the ability to investigate and solve bugs with time travel.
Now, Replay MCP gives your agents the same time-travel superpowers.
You can add Replay MCP to your own agents and workflows today! Plug it in, make a recording of a bug or a failed test, and let your agent do the time-travel investigative work for you.
And, coming soon: we’re working on expanding our existing E2E Playwright and Cypress recording integrations to automatically investigate test failures and provide analysis and recommended fixes! This will help your team fix flaky tests, ensure PR code quality, and improve shipping velocity.
Try out Replay MCP and our E2E Test Suites integrations today, and stay tuned - we’ve got a lot more awesome time-travel debugging capabilities coming soon!