Replay Time Travelogue: Improving Nadia’s “Debugging with AI” Results Using Replay MCP
Mark Erikson
In my previous post How Replay MCP Helped Find a React Bug Faster Than Dan Abramov Did, I showed how agents were able to successfully identify and fix a difficult React internals bug using just Replay recordings of the reproduction, and how different prompts and instructions resulted in faster analysis results.
However, most of us aren’t actually working on React’s internals :) We’re using React in apps , and dealing with app-level concerns: page loading times, network requests, good UX, and data correctness. What happens when you use AI to debug those kinds of problems? Can Replay MCP make agents better at finding real-world app bugs?
Background: Nadia’s “Debugging with AI” Example
Nadia Makarevich is author of the book Advanced React, as well as the excellent blog Developer Way, where she’s published numerous posts on React’s behavior and usage. She recently wrote a post titled “Debugging with AI: Can It Replace an Experienced Developer?”, where she tested LLM debugging abilities against an example Next.js app that had several representative bugs inserted to recreate problems she’d seen in real apps. She talked through her instructions to the LLM, how well it did solving each bug, and described where it came up with insufficient answers or just failed to solve the problem at all.
Nadia used Opus 4.5 to do realistic “blind” investigations on three different app bugs: a Zod schema error, a double loading skeleton, and a Next.js redirect error. She graded the LLM’s results in three ways:
Did the LLM fix the problem?
Did the LLM identify the root cause of the problem correctly?
Did the LLM fix the root cause of the problem correctly?
Her results and final analysis indicated that while the LLM had some success, it struggled to understand the deeper architectural issues and either failed to identify the root cause or gave misleading answers:
User Page is Broken
Double skeletons problem
Weird Redirect Error
Did LLM fix the problem?
✅
✅
❌
Did LLM identify the root cause of the problem correctly?
✅
❌
❌
Did LLM fix the root cause of the problem correctly?
✅
❌
❌
As she concluded:
💡
So, what's the verdict? Can AI replace an experienced developer for debugging?
No, of course not. AI is very good at pattern recognition, and it can be very, very useful. It will do exceptionally well on standard stuff: schema validation mishaps, forgotten null checks, common runtime errors, you've seen it for yourself already
But even if the fix works, I always retrace it step by step and make sure it's the actual root cause. If the fix doesn't work, however, I almost never iterate with the LLM, other than asking it to find me something in the repo. As you've seen, half of its responses in this case will be very confident hallucinations.
When the problem requires actual understanding of why the system behaves the way it does, or how it should behave, especially from a future or user perspective, the AI fails.
And the skill here isn't knowing how to prompt better. It's knowing when to stop prompting and start thinking.
I took this as a challenge :) I wanted to see if adding Replay recordings and Replay MCP to the mix would help an agent solve these problems any better. I also wanted to see if improving prompts and providing more context could produce better results.
Agentic Time-Travel Debugging with Replay MCP
Let’s step back and recap what Replay MCP actually is.
Replay is a time-travel debugger for web apps. The Replay recording browser captures DVR-style recordings of an entire web app. Because we capture the entire browser’s behavior and inputs, we can replay the app’s execution exactly the way it ran during the original recording. Unlike session replay or prod monitoring tools, Replay lets you do time-travel debugging and inspect the app’s behavior at any point in time: see which lines of code executed, evaluate log statements for every time a line ran, view the DOM tree and React component tree at any point in time, and much more. This makes it possible to investigate and solve bugs in ways no other tool can.
Replay DevTools is our debugging UI for humans - browser devtools with time-travel built in.Replay MCP gives agents those same time-travel debugging capabilities. Agents can open a recording and use the MCP tools to investigate the same way a human would: looking at console messages, adding logpoints to evaluate expressions each time a line of code ran, getting screenshots and stack traces, and getting framework-specific insights into libraries like React, Redux, Zustand, and TanStack Query. This means agents can now do the investigation work for you automatically!
Investigation Process and Setup
For this experiment, I used my own personal agent setup: OpenCode 1.4 and Opus 4.6. I have some file search and context management plugins enabled, but otherwise no specific skills or custom behaviors.
I wanted to run this as a proper experiment: how much do prompts and investigation instructions matter, and how much does Replay MCP help? Bug #1 was pretty trivial, but for Bug #2 and Bug #3, I kicked off four parallel agent investigation sessions. Each agent session was given the same access to the Replay recordings of the bug and a local copy of the demo app source, but with varying prompt instructions:
Light details, bug repro, only told to “write a bug report with a root cause and suggested fix”
Same repro steps, but a detailed 8-step investigation methodology including explaining why the problem is happening
Additional context on the app’s architecture and patterns
Additional relevant framework skills or concepts
How would they do? What differences would we see between them in results or investigation times?
Bug #1: Zod Schema Error in a User Profile
Bug #1 was pretty trivial. Navigate to /users/1 , and there’s a Zod schema error shown in both a UI error overlay and in the console messages. Nadia’s initial prompt was:
💡
When I navigate to the /users/1 route, the page shows a "Something went wrong" error. In the browser console, I see the following errors [copy-paste the entire error stack]. Server logs are clean. Can you fix it?
Her agent found this error immediately and correctly identified the fix: making the user address fields optional, and handling the type changes when rendering user.address .
My agent was given somewhat more detailed instructions on investigating and using the MCP, but this one was a gimme anyway. It called the RecordingOverview MCP tool, immediately saw the Zod error, used Screenshot to confirm the UI overlay, then used the NetworkRequest tool to check the actual API response and confirmed the address and phone fields were missing despite the schema marking them as required. It nailed the issue and cause in about 2 minutes.
Bug #1: Agent prompt
Task: Debug a Bug in a Replay Recording
Recording ID:b8c1cf71-0c97-40c2-8b46-93704e6e8407
You have access to a Replay.io recording of a Next.js application that has a bug. The bug manifests after the user navigates to the "User Profile" page.
Your Mission
Investigate the recording to:
Identify what the bug is — what goes wrong from the user's perspective?
Find the root cause — trace it back to the specific code/logic error
Suggest a fix — what code change would resolve this?
Approach
Start with RecordingOverview to get the lay of the land, then use whatever Replay tools make sense:
Screenshot to see what the user saw
UserInteractions to understand what they did
ConsoleMessages for errors/warnings
NetworkRequest for failed/slow requests
ReactComponentTree / ReactRenders for component-level issues
UncaughtException / ReactException for crashes
SearchSources / ReadSource / DescribePoint / Evaluate for code-level investigation
Work methodically. Don't assume what the bug is — discover it from the recording evidence.
Deliverable
Write up your findings:
Bug description — what the user experiences
Root cause — the specific code/logic error, with evidence from the recording
Suggested fix — concrete code change
Key evidence trail — the chain of observations that led to the diagnosis
Keep it concise but thorough. This is a demo of Replay's debugging capabilities.
Bug #1: Agent final analysis
As expected — API routes run on the server and aren't in the browser recording. But we already have the response body, which is the definitive evidence.
Here's the complete bug analysis:
Bug Report
1. Bug Description
When the user navigates to the User Profile page (by clicking around ~7s into the recording), the page crashes and shows a Next.js error boundary instead of the user's profile. The user sees an error screen rather than their profile information.
2. Root Cause
Schema-API mismatch. The UserSchema Zod validator in src/lib/schemas.ts:238-255 requires fields that the /api/user endpoint doesn't return.
The schema expects (among other fields):
phone: z.string() (line 242) — required string
address: z.object({...}) (line 246-252) — required object with street, city, state, postalCode, country
fetchUser() in src/queries/useUser.ts:11 calls UserSchema.parse(data) on the API response
Zod throws ZodError with two issues: phone expected string got undefined, address expected object got undefined
TanStack Query surfaces this as error in the query result
UserPage (line 62-63) checks if (error) and re-throws it: throw error
Next.js ErrorBoundaryHandler catches the thrown error and renders the error screen
Lines 66+ (actual profile rendering) execute 0 times — never reached
3. Suggested Fix
Option A: Fix the API — Add the missing phone and address fields to the /api/user response:
typescript
// In the API route handler for /api/userreturn Response.json({id:1,name:"John Doe",email:"john.doe@example.com",phone:"+1-555-0123",// ADDdepartment:"Engineering",avatarUrl:null,role:"admin",address:{// ADDstreet:"123 Main St",city:"San Francisco",state:"CA",postalCode:"94105",country:"US",},joinedAt:"2023-03-15T09:00:00Z",lastLoginAt:"2024-01-28T14:32:00Z",});
Option B: Fix the schema — Make phone and address optional in UserSchema so it tolerates the current API response:
Option A is better if the data should exist. Option B is better if the data is genuinely optional.
4. Key Evidence Trail
Step
Tool
Finding
1
RecordingOverview
ZodError for phone and address fields in console errors at ~8.2s
2
ConsoleMessages (errors)
Error originates in fetchUser at useUser.ts:15, caught by <UserPage> error boundary
3
UserInteractions
User clicks at ~7s (navigating to profile page)
4
NetworkRequest (detail for /api/user)
Response body has nophone or address fields, returns 200 OK
5
ReadSource (schemas.ts:238)
UserSchema requires phone: z.string() and address: z.object({...}) — both non-optional
6
ReadSource (useUser.ts)
fetchUser() calls UserSchema.parse(data) which throws on the missing fields
7
ReadSource (UserPage.tsx:55)
Component re-throws the query error at line 63; lines 66+ (profile rendering) have 0 execution hits — never reached
8
Screenshot at 15s
Confirms user sees an error boundary crash screen, not the profile
Bug #2: User Page Double Loading Skeleton
After fixing the first issue, Nadia described the second bug in the app: throttle the network connection, navigate from the root page to a user profile, and see two different loading skeleton flash on screen before the final content is displayed. But, loading the user profile directly only shows one skeleton.
Nadia’s prompt was:
💡
When I'm on the root page and navigate to the User page, I see two different loading skeletons appear. However, when I just refresh the User page, I see only one. Why is that and how to fix it?
She reported that her agent came up with a variety of answers, several of which involved switching from TanStack Query’s useQuery hook to useSuspenseQuery . That answer is half right, but switching to useSuspenseQuery uncovered a second problem.
The double skeletons had to do with mixing Next’s loading.tsx page Suspense boundary, with an app-level skeleton component based on the TSQ isLoading flag. However, just switching to useSuspenseQuery introduces a hydration mismatch error down the line. The real fix is architectural - actually prefetching data on the server, and hydrating the query hook on the client side with the same data.
Given that, I wanted to see: would an agent with Replay MCP actually find the real root cause and suggest server prefetching as the full solution?
I kicked off three agent sessions: Nadia’s baseline prompt, additional investigative method instructions, and method + some app architecture context. The results were partially an improvement on Nadia’s agent - here’s how they did.
Bug #2 Agent #1: Basic Prompt
Agent #1 started by inspecting the Replay recording to understand the overall behavior. It called RecordingOverview to get the initial details, UserInteractions to get a list of clicks, and and made numerous Screenshot to get visuals on the navigation. It then used ListSources and SearchSources to identify the Users page routes and checked for Suspense handling.
A ReadSource call for UserPage.tsx returned hit counts showing it had rendered twice, once with isLoading: true. The agent then made multiple ReactRenders tools calls to check the actual render sequence and triggers. The render details showed a mixture of Suspense and loading-state based causes, confirming the mismatch. Finally, it was able to read the local repo files on disk to confirm the available hooks.
Overall, this agent did a reasonable actual investigation, came up with real evidence, wrapped up the search and reported. The answer was directionally correct, just not complete enough.
Total time: 5 minutes
Bug #2 Agent #1 Prompt
💡
Task: Debug a Bug in a Replay Recording
Recording ID:c832545f-a41b-4cd0-9e59-e229b7ca0b7f
You have access to a Replay.io recording of a web application that has a bug.
Bug report from user:
When I'm on the root page and navigate to the User page, I see two different loading skeletons appear. However, when I just refresh the User page, I see only one. Why is that and how to fix it?
Your Mission
Investigate the recording to:
Understand what's happening in both scenarios (navigation vs refresh)
Find the root cause of the double skeleton
Suggest a fix
Tools Available
Replay MCP tools for analyzing the recording (use replay_ prefixed tools)
The application source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't accessible in the recording
Deliverable
Write up your findings:
Bug description — what the user experiences
Root cause — the specific code/logic error, with evidence
Suggested fix — concrete code change
Key evidence trail — the observations that led to your diagnosis
Bug #2 Agent #1 Solution
💡
Bug Report: Double Loading Skeleton on Client-Side Navigation to User Page
1. Bug Description
When navigating from the root page (/) to the User page (/users/[id]) via client-side navigation, the user sees two different loading skeletons in sequence:
First, a simpler skeleton from loading.tsx (avatar circle + name bar + content area)
Then, a more detailed skeleton from UserPageSkeleton inside the component (avatar + name + badge + separator + grid)
On a direct page refresh of the User page URL, only one skeleton appears. The visual "flash" between two different skeleton designs during navigation is jarring.
2. Root Cause
There are two independent loading mechanisms that fire sequentially during client-side navigation:
The page component at src/app/users/[id]/page.tsx:5-10 is an async server component:
typescript
exportdefaultasyncfunctionPage({ params }:{params: Promise<{id: string }>}){const{ id }=await params;// <-- this `await` causes suspensionreturn<UserPage userId={id}/>;}
During client-side navigation, Next.js needs to fetch the server component payload. The await params causes the route segment to suspend, which triggers the loading.tsx Suspense fallback. This renders Skeleton #1 (commit 135 at 8.89s — RenderSuspended trigger, 145 DOM mutations).
When the Suspense boundary resolves (server component payload arrives), UserPage mounts. The useQuery hook starts fetching /api/user and immediately returns isLoading: true, rendering Skeleton #2 (commit 136 at 9.02s — SuspenseResumed trigger).
Why refresh works fine: On a full page refresh, the async server component resolves server-side before sending HTML. The client only sees the useQuery loading state — there's no sequential Suspense → component loading transition.
3. Suggested Fix
The cleanest fix is to switch from useQuery to useSuspenseQuery so that the data fetching integrates with the existing Suspense boundary. The useUserSuspense hook already exists in src/queries/useUser.ts:22-27 but is unused.
The component suspends while data is loading, keeping the loading.tsx skeleton visible
When data arrives, the component renders with the actual user data — no intermediate skeleton
Error handling is automatic (suspense queries throw to the nearest error boundary)
The UserPageSkeleton function and isLoading branch can be removed entirely (or kept for other uses)
Both navigation and refresh paths show a single, consistent loading skeleton from loading.tsx
Alternative fix (if you want to keep useQuery): Delete src/app/users/[id]/loading.tsx so only the component-level skeleton appears. But this is worse because you'd lose the instant loading feedback during navigation — there'd be a blank delay before the component even mounts.
4. Key Evidence Trail
Step
Evidence
Tool/Source
Identified two skeletons
React commits 135 + 136 in the recording
ReactRenders commits mode, time range 8.5-10s
Commit 135: RenderSuspended trigger
loading.tsx fallback shown with 145 DOM mutations at 8.89s
ReactRenders commit detail
Commit 136: SuspenseResumed trigger
UserPage mounts, renders UserPageSkeleton (8 Skeleton children) at 9.02s
ReactRenders commit detail
Commit 137: Data arrives
UserPage re-renders with actual user data at 9.72s
ReactRenders commit detail
useQuery vs useSuspenseQuery
useUser() uses useQuery — doesn't integrate with Suspense
Source: src/queries/useUser.ts
Unused suspense hook exists
useUserSuspense() already defined but not imported
Source: src/queries/useUser.ts:22-27
page.tsx is async
await params causes server component to suspend during navigation
Source: src/app/users/[id]/page.tsx
Bug #2 Agent #2: Investigative Process
Agent #2 was given a more detailed 8-step investigative process to follow. How much of a difference would that make?
Agent #2 started with the same initial tool calls: RecordingOverview to orient itself, UserInteractions and Screenshot to understand the usage. It also used NetworkRequest to check the timestamps of calls to /api/user and /api/current-user.
After calling ListSources to check the files in the recording, it opted to read them all from the local filesystem.
From there, it used ReactRenders to focus in on the critical Suspense-related renders and causes, but also started reading the various Next routes to see how they handled loading patterns. It identified that most routes did have an isLoading check, but only the users/[id] route had a loading.tsx file included, explaining why the bug is specific to the User page.
This agent came up with much better explanations and relevant details in its report. It identified a waterfall request happening, and found the loading skeleton mismatch. Its primary suggestion was the same incomplete useSuspenseQuery fix, but it did explicitly say “Ideal optimization would be server-side prefetching to eliminate the waterfall”.
Total time: 9 minutes
Bug #2 Agent #2 Prompt
💡
Task: Debug a Bug in a Replay Recording
Recording ID:c832545f-a41b-4cd0-9e59-e229b7ca0b7f
You have access to a Replay.io recording of a web application that has a bug.
Bug report from user:
When I'm on the root page and navigate to the User page, I see two different loading skeletons appear. However, when I just refresh the User page, I see only one. Why is that and how to fix it?
Your Mission
Investigate the recording to find the root cause and suggest a complete fix.
Tools Available
Replay MCP tools for analyzing the recording (use replay_ prefixed tools)
The application source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't accessible in the recording
Investigation Process
Follow this structured approach:
Establish the user experience. Use screenshots and interaction timeline to understand exactly what happens visually in both scenarios (navigation vs refresh). Pin down timestamps.
Map the data flow. Identify what data the page needs, where it comes from, when fetches start relative to navigation, and what components consume the results. Use network requests + source code to build this picture.
Build the render timeline. Use React render analysis to understand what components mount, unmount, and re-render — in what order, triggered by what. Compare the component render sequence between the two scenarios if possible.
Compare the two scenarios explicitly. Navigation and refresh produce different results. What's concretely different in the execution path? Don't just explain one path — explain why they diverge.
Find the proximate cause, then go deeper. After identifying the immediate code-level issue, ask yourself: "Does my proposed fix fully resolve the problem, or does it just reduce the symptom?" Consider:
Does the fix eliminate unnecessary work, or just hide it?
Is there a sequential dependency (waterfall) that shouldn't exist?
Would the fix match how other similar pages in the app work?
Check for patterns. Look at how other pages/routes in the app handle data fetching and loading states. If there's a pattern that works correctly elsewhere, the fix should align with that pattern.
Consider framework idioms. Is the code using the framework's intended data flow patterns? If the framework provides a mechanism for this exact scenario, is it being used?
Red-team your conclusion. Before presenting your fix, challenge it: What would still be suboptimal after applying it? Is there a remaining waterfall or unnecessary loading state? Would an expert reviewing this fix say "yes, but you should also..."?
Deliverable
Write up your findings:
Bug description — what the user experiences in each scenario
Root cause analysis — the specific code/logic/architectural error, with evidence from the recording
Suggested fix — concrete code change(s). If there are multiple layers to the fix (immediate + architectural), describe both.
Key evidence trail — the chain of observations that led to your diagnosis
Remaining concerns — anything that your fix doesn't address, or that you're uncertain about
Bug #2 Agent #2 Solution
💡
Bug Report: Double Loading Skeleton on User Page Navigation
1. Bug Description
What the user sees:
Navigation (root → User page): Two visually distinct loading skeletons appear in sequence:
A small card skeleton (12px avatar, narrow name placeholder) for ~130ms
A larger, more detailed skeleton (24px avatar, separator, 2-column grid) for ~700ms
Then the actual user profile renders
Page refresh on User page: Only one loading skeleton appears before the data renders.
The two skeletons have different layouts, sizes, and levels of detail, making the transition jarring.
2. Root Cause Analysis
The bug is a conflict between two independent loading mechanisms that both activate during client-side navigation but not during SSR/refresh.
useUser() (non-suspense useQuery) starts fetching /api/user after mount
Why navigation shows two skeletons:
User clicks "User" link → Next.js starts client-side route transition
Commit 135 (8.89s): Next.js hits the Suspense boundary at users/[id]/loading.tsx, renders Skeleton #1 while loading the RSC payload and JS chunks
Commit 136 (9.02s): RSC resolves, UserPage component mounts. But useUser() uses useQuery (not suspense-aware), so it returns isLoading: true immediately and renders Skeleton #2 (UserPageSkeleton)
Commit 137 (9.72s):/api/user response arrives (fetched at 9172ms, took 548ms), UserPage re-renders with actual data
Why refresh shows one skeleton:
On a full page load, Next.js SSR renders the server component synchronously. The loading.tsx Suspense boundary either doesn't trigger or resolves instantly during SSR. The client hydrates with UserPage already mounted, and only the client-side useQuery loading state is visible.
The architectural error:UserPage uses useQuery (non-suspense) despite having a Next.js loading.tsx Suspense boundary in the route. This creates two competing loading state mechanisms. The codebase already has useUserSuspense() (using useSuspenseQuery) in src/queries/useUser.ts — it's just not being used.
3. Suggested Fix
Primary fix: Switch UserPage to use useSuspenseQuery and remove the manual loading state handling. This lets the existing loading.tsx Suspense boundary handle all loading UI.
The UserPageSkeleton function (lines 27-55) can also be removed from this file since it's no longer used — the loading.tsx skeleton handles that role.
Additional consideration: The loading.tsx skeleton is currently simpler/smaller than UserPageSkeleton. You may want to update src/app/users/[id]/loading.tsx to match the richer skeleton layout (24px avatar, separator, 2-column grid) so the loading state better matches the final rendered page.
Why this is the correct approach:
Other query files (useOrder.ts, useProduct.ts) also export both useX() and useXSuspense() variants — but those pages don't have loading.tsx files, so the non-suspense version works fine
The users/[id]/loading.tsx file exists specifically to be the loading UI for this route — it should be the single source of loading state
useSuspenseQuery integrates with React Suspense natively, which is exactly how Next.js loading.tsx works (it wraps the page in a <Suspense> boundary)
4. Key Evidence Trail
Screenshots at 8742ms vs 9187ms showed two visually different skeletons appearing in sequence
React render commits 135→136→137 showed the exact sequence: RenderSuspended (loading.tsx) → SuspenseResumed + UserPageSkeleton mount → StoreRerender with data
Network timeline showed /api/user fetch doesn't start until 9172ms (~580ms after click), proving the client-side fetch only begins after the component mounts
Source code confirmed useUser() uses useQuery while useUserSuspense() using useSuspenseQuery exists but is unused
File comparison showed users/[id]/loading.tsx exists but orders/[id]/loading.tsx and products/[id]/loading.tsx do not — explaining why this bug is unique to the User page
5. Remaining Concerns
Waterfall still exists with useSuspenseQuery: The /api/user fetch still won't start until the client component mounts after RSC resolution. During the Suspense phase, the loading.tsx skeleton shows while the RSC payload loads, then Suspense resolves, UserPage mounts, useSuspenseQuery triggers the fetch, and Suspense suspends again while the fetch completes. This means the loading.tsx skeleton appears for the full duration (RSC + API fetch), which is better UX (one consistent skeleton) but the fetch itself is still sequential after RSC.
Ideal optimization would be server-side prefetching: The best Next.js pattern would be to prefetch user data on the server (e.g., using TanStack Query's prefetchQuery in the server component at users/[id]/page.tsx) and dehydrate it. This would eliminate the client-side fetch waterfall entirely — the data would already be in the cache when useSuspenseQuery runs, so no second Suspense suspension would occur.
loading.tsx skeleton fidelity: The current loading.tsx skeleton is less detailed than UserPageSkeleton. After the fix, users will see the simpler skeleton for the entire loading duration, which might feel like a regression in loading UI quality. Consider updating loading.tsx to match the richer layout.
Bug #2 Agent #3: App Architecture Summary
Agent #3 was given the same investigative steps, but also a few bullet points summarizing the known packages and structure of this app (Next.js + TanStack Query, mix of RSCs and hook loading).
Most of the investigative flow was the same: RecordingOverview, interactions, screenshots, sources, network requests. It then specifically browsed most of the query hook definitions and confirmed that Suspense-enabled query hooks existed, as well as checking more screenshots to confirm the loading skeleton visuals.
This agent came up with the fastest response. It still recommended useSuspenseQuery, but did also list server prefetching as a follow-up step. It also flagged OrderPage and ProductPage as having the same problematic pattern, and that the UserPageSkeleton component would be dead and could be removed.
Total time: 4 minutes
Bug #3 Agent #3 Prompt
💡
Task: Debug a Bug in a Replay Recording
Recording ID:c832545f-a41b-4cd0-9e59-e229b7ca0b7f
You have access to a Replay.io recording of a web application that has a bug.
Bug report from user:
When I'm on the root page and navigate to the User page, I see two different loading skeletons appear. However, when I just refresh the User page, I see only one. Why is that and how to fix it?
Your Mission
Investigate the recording to find the root cause and suggest a complete fix.
Tools Available
Replay MCP tools for analyzing the recording (use replay_ prefixed tools)
The application source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't accessible in the recording
Application Context
Pre-analysis of the repository shows:
Framework: Next.js 14+ using the App Router (app/ directory)
Data fetching: TanStack Query (React Query) for client-side data management
Component model: Mix of React Server Components (RSC) and Client Components
Loading states: Route-level loading.tsx files provide framework-managed loading UI
Route structure: The User Profile page is at src/app/users/[id]/, with its own loading.tsx
Query hooks: The app defines both standard and suspense-aware variants of its query hooks (e.g. useUser and useUserSuspense)
Investigation Process
Follow this structured approach:
Establish the user experience. Use screenshots and interaction timeline to understand exactly what happens visually in both scenarios (navigation vs refresh). Pin down timestamps.
Map the data flow. Identify what data the page needs, where it comes from, when fetches start relative to navigation, and what components consume the results. Use network requests + source code to build this picture.
Build the render timeline. Use React render analysis to understand what components mount, unmount, and re-render — in what order, triggered by what. Compare the component render sequence between the two scenarios if possible.
Compare the two scenarios explicitly. Navigation and refresh produce different results. What's concretely different in the execution path? Don't just explain one path — explain why they diverge.
Find the proximate cause, then go deeper. After identifying the immediate code-level issue, ask yourself: "Does my proposed fix fully resolve the problem, or does it just reduce the symptom?" Consider:
Does the fix eliminate unnecessary work, or just hide it?
Is there a sequential dependency (waterfall) that shouldn't exist?
Would the fix match how other similar pages in the app work?
Check for patterns. Look at how other pages/routes in the app handle data fetching and loading states. If there's a pattern that works correctly elsewhere, the fix should align with that pattern.
Consider framework idioms. Is the code using the framework's intended data flow patterns? If the framework provides a mechanism for this exact scenario, is it being used? Pay particular attention to the boundary between server and client — where does data originate, and is it being passed efficiently across that boundary?
Red-team your conclusion. Before presenting your fix, challenge it: What would still be suboptimal after applying it? Is there a remaining waterfall or unnecessary loading state? Would an expert reviewing this fix say "yes, but you should also..."?
Deliverable
Write up your findings:
Bug description — what the user experiences in each scenario
Root cause analysis — the specific code/logic/architectural error, with evidence from the recording
Suggested fix — concrete code change(s). If there are multiple layers to the fix (immediate + architectural), describe both.
Key evidence trail — the chain of observations that led to your diagnosis
Remaining concerns — anything that your fix doesn't address, or that you're uncertain about
Bug #2 Agent #3 Solution
💡
Bug Report: Double Loading Skeleton on Navigation to User Page
1. Bug Description
What the user experiences:
Navigation (root → User page): Two visually distinct loading skeletons appear in sequence:
Skeleton #1 (~130ms, 9063-9187ms): A dashboard-style skeleton with 4 stat card placeholders and a content grid — this is the rootapp/loading.tsx fallback, completely wrong for the user page context.
Skeleton #2 (~530ms, 9187-9720ms): A user profile card skeleton with avatar circle and name bars — this is the UserPageSkeleton rendered inside UserPage while useUser() fetches data.
Hard refresh on User page: Only one skeleton appears (the UserPageSkeleton from useUser()'s loading state). The transition is clean.
During client-side navigation, Next.js lazy-loads the page component's JS chunk. While the chunk for app/users/[id]/page.js downloads (~323ms), React suspends. The framework looks for the nearest Suspense boundary with a fallback — but the route-level app/users/[id]/loading.tsx hasn't loaded yet either. So React falls back to app/loading.tsx (the root loading file), which shows a dashboard-style skeleton for a user profile page. This is visually jarring and contextually wrong.
Evidence from React commits:
Commit 135 (8.89s): RenderSuspended — React.lazy init for page.js chunk. 145 DOM mutations — this is the heavy swap to the root skeleton.
Layer 2: Non-suspense data fetching creates a second loading phase
UserPage uses useUser() — a standard useQuery hook that returns { isLoading: true } initially. The component manually checks isLoading and renders <UserPageSkeleton />. This creates a sequential waterfall:
On hard refresh, the JS chunk is included in the initial SSR payload, so the chunk-loading Suspense phase is skipped entirely. The user only sees the useUser() loading state.
Evidence from network timeline:
/api/user fetch doesn't start until 9172ms — 505ms after the navigation click
The 500ms API delay is on top of the chunk-loading waterfall
3. Suggested Fix
The app already has the correct solution defined but unused. Every query hook file exports a useSuspenseQuery variant. The fix is to use it.
Primary fix: Switch UserPage to use useUserSuspense()
Next.js's app/users/[id]/loading.tsx becomes the single loading boundary for both chunk loading AND data fetching
The user sees one contextually-correct skeleton (user profile card) instead of two mismatched ones
The UserPageSkeleton inline component and isLoading check become unnecessary (can be removed)
Error handling is automatic — useSuspenseQuery throws on error, which React error boundaries catch
Same pattern applies to other pages
OrderPage and ProductPage have the identical issue. They should also switch to their suspense variants (useOrderSuspense, useProductSuspense).
4. Key Evidence Trail
Screenshots at 9063ms vs 9187ms showed two visually distinct skeletons in rapid succession
React render commits 135-137 confirmed the three-phase sequence: suspend (chunk) → resume (mount + fetch) → data arrives
Network timeline showed /api/user starting at 9172ms, 505ms after click — confirming the waterfall
Source code revealed useUser() (non-suspense) being used despite useUserSuspense() being defined in the same file
app/loading.tsx confirmed the root skeleton is a dashboard layout — wrong for the user page context
All query hooks export unused suspense variants — the codebase was designed for this pattern but didn't follow through
5. Remaining Concerns
Other pages:OrderPage and ProductPage use the same anti-pattern (useOrder/useProduct + isLoading + manual skeleton). They'll exhibit the same double-skeleton bug on client-side navigation. The fix should be applied consistently.
Server-side prefetching: Even with useSuspenseQuery, the data fetch still starts client-side after the component mounts. The ideal Next.js pattern would be to prefetch on the server (in the RSC page component) and dehydrate into the client, so the TanStack Query cache is warm before the client component mounts. This would eliminate the data-fetching loading state entirely on navigation. But switching to useSuspenseQuery is the correct first step — it fixes the double-skeleton bug and properly integrates with framework loading boundaries.
UserPageSkeleton component: After the fix, the inline UserPageSkeleton in UserPage.tsx becomes dead code. The route-level app/users/[id]/loading.tsx handles all loading UI. The inline skeleton should be removed to avoid confusion.
Bug #2: Initial Results
After running those three sessions, I was pleased to see that the agents had at least found the initial problems and identified the same useSuspenseQuery fix that Nadia’s agent had suggested. But, as she pointed out, that “fix” is incomplete and actually leads to hydration errors.
Meanwhile, it was clear that adding more codebase context and better instructions produces higher quality answers and faster results. Even just adding a few bullet points describing the rough loading patterns used in the app let Agent #3 zero in on the problem much faster.
So, what if I tried giving the agent more info on the actual technologies used in this app? More specifically, what if I gave it access to the official SKILL.md files for Next.js and TanStack Query? How would that change the results? Would the agent be able to identify the real root cause and suggest server prefetching as an actual solution?
Bug #2 Agent #4: Skill Files
I spun up Agent #4 with the same instructions as before (bug report, investigation steps, app context), but added in two extra sections: “CI context” saying this was a manual repro of a known E2E test failure and listing project dependencies, and specific pointers to the SKILL files for TanStack Query, Next.js App Router data fetching, Next.js Suspense boundaries, and Vercel’s “React best practices”.
This agent started by actively reading the skill files. From there, most of the investigation followed the same paths: Replay MCP tools to scan the recording, get screenshots, network requests, sources, and React renders. It also cross-checked all of the query hook files and the different app pages.
However, this time the result was different. While it still recommended useSuspenseQuery as the primary fix, the agent explicitly recommended “Deeper fix: eliminate the waterfall with server-side prefetching” and gave a copy-pasteable example showing use of queryClient.prefetchQuery() in the server component that exactly matched Nadia’s recommended fix! That is the correct answer!
This agent also identified several other concerns, including the Order and Product pages having the same issue, the query key not including userId, and other loading state issues.
Total time: 5 minutes
Bug #2 Agent #4 Prompt
💡
Task: Debug a Bug in a Replay Recording
Recording ID:c832545f-a41b-4cd0-9e59-e229b7ca0b7f
You have access to a Replay.io recording of a web application that has a bug.
Bug report from user:
When I'm on the root page and navigate to the User page, I see two different
loading skeletons appear. However, when I just refresh the User page, I see
only one. Why is that and how to fix it?
Your Mission
Investigate the recording to find the root cause and suggest a complete fix.
Tools Available
Replay MCP tools for analyzing the recording (use replay_ prefixed tools)
The application source code is available at ~/projects/temp/debugging-with-ai
if you need to read files that aren't accessible in the recording
CI Context
This was reported from a navigation E2E test for the User Profile page that
detected unexpected loading state behavior. The recording captures a manual
reproduction of the issue.
Key dependencies:
Next.js ^15.5.12 (App Router)
React ^19
@tanstack/react-query ^5.90.16
TypeScript ^5.3.0
Application Context
Pre-analysis of the repository shows:
Framework: Next.js 14+ using the App Router (app/ directory)
Data fetching: TanStack Query (React Query) for client-side data management
Component model: Mix of React Server Components (RSC) and Client Components
Loading states: Route-level loading.tsx files provide framework-managed
loading UI
Route structure: The User Profile page is at src/app/users/[id]/, with
its own loading.tsx
Query hooks: The app defines both standard and suspense-aware variants of
its query hooks (e.g. useUser and useUserSuspense)
Technology Skills
You have been provided with best-practices skills for the technologies in this
app. Read the smaller skill files before starting your investigation — they
contain framework-specific patterns and idioms that may be relevant:
~/projects/temp/react-skills/skills/tanstack-query.md — TanStack Query
patterns including SSR, hydration, and query hook variants
~/projects/temp/react-skills/skills/next-data-patterns.md — Next.js App
Router data fetching patterns and decision trees
~/projects/temp/react-skills/skills/next-suspense-boundaries.md — Suspense
boundary and CSR bailout rules
~/projects/temp/react-skills/skills/react-best-practices.md — Comprehensive
React best practices (LARGE file, ~3800 lines). Do NOT read this fully — scan
headers with rg "^#{1,3} " first, then read only sections relevant to the
issue you're investigating.
Investigation Process
Follow this structured approach:
Establish the user experience. Use screenshots and interaction timeline
to understand exactly what happens visually in both scenarios (navigation vs
refresh). Pin down timestamps.
Map the data flow. Identify what data the page needs, where it comes
from, when fetches start relative to navigation, and what components consume
the results. Use network requests + source code to build this picture.
Build the render timeline. Use React render analysis to understand what
components mount, unmount, and re-render — in what order, triggered by what.
Compare the component render sequence between the two scenarios if possible.
Compare the two scenarios explicitly. Navigation and refresh produce
different results. What's concretely different in the execution path? Don't
just explain one path — explain why they diverge.
Find the proximate cause, then go deeper. After identifying the immediate
code-level issue, ask yourself: "Does my proposed fix fully resolve the
problem, or does it just reduce the symptom?" Consider:
Does the fix eliminate unnecessary work, or just hide it?
Is there a sequential dependency (waterfall) that shouldn't exist?
Would the fix match how other similar pages in the app work?
Check for patterns. Look at how other pages/routes in the app handle data
fetching and loading states. If there's a pattern that works correctly
elsewhere, the fix should align with that pattern.
Consider framework idioms. Is the code using the framework's intended
data flow patterns? If the framework provides a mechanism for this exact
scenario, is it being used? Pay particular attention to the boundary between
server and client — where does data originate, and is it being passed
efficiently across that boundary?
Red-team your conclusion. Before presenting your fix, challenge it: What
would still be suboptimal after applying it? Is there a remaining waterfall
or unnecessary loading state? Would an expert reviewing this fix say "yes,
but you should also..."?
Deliverable
Write up your findings:
Bug description — what the user experiences in each scenario
Root cause analysis — the specific code/logic/architectural error, with
evidence from the recording
Suggested fix — concrete code change(s). If there are multiple layers to
the fix (immediate + architectural), describe both.
Key evidence trail — the chain of observations that led to your diagnosis
Remaining concerns — anything that your fix doesn't address, or that
you're uncertain about
Bug #2 Agent #4 Solution
💡
Bug Report: Double Loading Skeleton on User Page Navigation
1. Bug Description
What the user sees:
Navigating to the User page (e.g., from dashboard): Two visually distinct loading skeletons appear in sequence — a simpler card skeleton (~130ms), then a more detailed profile skeleton (~700ms) — before the actual content renders.
Refreshing the User page directly: Only one skeleton (the detailed profile skeleton) appears, then content renders.
The two skeletons have different layouts (different card structure, different number of skeleton elements), creating a jarring visual transition during navigation.
2. Root Cause Analysis
The bug stems from two independent, overlapping loading mechanisms competing for the same UI slot:
src/app/users/[id]/loading.tsx provides a Suspense fallback for the route segment. During client-side navigation, Next.js lazy-loads the page chunk via React.lazy. While the JS chunk is downloading (323ms), React suspends and shows this loading.tsx skeleton.
UserPage (a 'use client' component) uses useUser() — which is standard useQuery, NOT useSuspenseQuery. This means:
The component renders immediately with isLoading: true
It manually returns <UserPageSkeleton /> during loading
When the fetch completes, it re-renders with actual data
Why navigation shows two skeletons:
Time
Event
Visible
8588ms
User clicks navigation link
8667ms
RSC navigation fetch starts
8764ms
Page JS chunk fetch starts
8890ms
React.lazy suspends → loading.tsx shown
Skeleton #1 (route-level)
9020ms
Chunk loaded, UserPage mounts, useUser() fires
9020ms
isLoading: true → UserPageSkeleton shown
Skeleton #2 (component-level)
9172ms
/api/user fetch actually starts
9720ms
Data arrives, content renders
User profile
Why refresh shows only one skeleton:
On a full page load, the page JS is served as part of the initial HTML/SSR bundle. There's no lazy-loading, so loading.tsx never triggers. UserPage mounts immediately, useUser() returns isLoading: true, and only the component-level UserPageSkeleton appears.
The architectural mismatch:
The real issue is that UserPage uses useQuery (non-suspense) while also having a route-level loading.tsx (which implies Suspense-based loading). These are two different loading paradigms:
loading.tsx = "the framework manages loading via Suspense boundaries"
useQuery + manual isLoading check = "the component manages its own loading state"
Using both creates the double-skeleton problem during navigation.
3. Suggested Fix
Immediate fix: Switch to useSuspenseQuery
The app already defines useUserSuspense() — it's just not being used. The fix is to use it, which unifies the loading mechanism under Suspense:
src/components/pages/UserPage.tsx:
typescript
// BEFORE:import{ useUser }from'@/queries/useUser';exportfunctionUserPage({ userId }: UserPageProps){const{data: user, isLoading, error }=useUser();if(isLoading){return<UserPageSkeleton />;}if(error){throw error;}// ...}// AFTER:import{ useUserSuspense }from'@/queries/useUser';exportfunctionUserPage({ userId }: UserPageProps){const{data: user }=useUserSuspense();// No isLoading check needed — Suspense handles it// No error check needed — useSuspenseQuery throws to error boundaryif(!user){return(/* user not found UI */);}// ... rest of the render}
With useSuspenseQuery:
During navigation, when UserPage mounts, the query suspends → React shows the sameloading.tsx fallback that was already showing for the chunk load. The user sees ONE continuous skeleton.
During refresh, the query suspends on first render → loading.tsx shows ONE skeleton.
Both paths now show the same single skeleton.
The UserPageSkeleton component and the isLoading branch can be removed entirely.
Deeper fix: Eliminate the waterfall with server-side prefetching
Even after the immediate fix, there's still a waterfall: the /api/user fetch only starts after the page JS loads and UserPage mounts (the fetch started at 9172ms, ~500ms after navigation). The ideal pattern in Next.js App Router is to prefetch data on the server and hydrate it:
src/app/users/[id]/page.tsx:
typescript
import{ HydrationBoundary, dehydrate }from'@tanstack/react-query';import{ getQueryClient }from'@/providers/getQueryClient';import{ UserPage }from'@/components/pages/UserPage';interfacePageProps{params: Promise<{id: string }>;}exportdefaultasyncfunctionPage({ params }: PageProps){const{ id }=await params;const queryClient =getQueryClient();// Prefetch on the server — starts immediately with the RSC requestawait queryClient.prefetchQuery({queryKey:['user'],queryFn:async()=>{// Direct server-side data fetch (no /api roundtrip needed)const{ getUser }=awaitimport('@/lib/data');returngetUser();},});return(<HydrationBoundary state={dehydrate(queryClient)}><UserPage userId={id}/></HydrationBoundary>);}
This eliminates the waterfall by fetching data in parallel with (or even before) the client JS chunk load. When UserPage mounts with useSuspenseQuery, the data is already in the cache — no suspension, no skeleton at all during navigation.
4. Key Evidence Trail
Screenshots at 8829ms vs 9063ms — Two visually distinct skeletons confirmed on screen during navigation
React render commit 136 (9.02s) — SuspenseResumed, UserPage mounts, renders UserPageSkeleton due to isLoading: true (skeleton #2)
React render commit 137 (9.72s) — StoreRerender from TanStack Query → data arrives, profile renders
Network timeline — /api/user fetch starts at 9172ms, 500ms after navigation click, confirming client-side waterfall
Source code — UserPage uses useUser() (standard useQuery) not useUserSuspense(), despite useUserSuspense being defined and available
Pattern comparison — OrderPage and ProductPage have the same useQuery + manual isLoading pattern but do NOT have loading.tsx files, so they only show one skeleton. The user/[id] route is the only one with both mechanisms.
5. Remaining Concerns
OrderPage and ProductPage have the same anti-pattern (just less visible). They use useQuery + manual loading when they could use useSuspenseQuery + route-level loading.tsx for a more consistent architecture. The same waterfall exists for those routes — the API fetch only starts after the component mounts.
The useUser() query key ['user'] doesn't include userId. This means all user pages share the same cache entry regardless of which user ID is in the URL. The userId prop is passed to UserPage but never used in the query. This is likely a separate bug.
No server-side prefetching anywhere in the app. The getQueryClient.ts already has dehydrate configuration including shouldDehydrateQuery for pending queries, suggesting prefetching was planned but never implemented.
The loading.tsx skeleton and UserPageSkeleton show different layouts. Even after fixing the double-skeleton issue, you should decide which skeleton design you want and consolidate. If using the Suspense approach, only loading.tsx matters.
Bug #2 Conclusions
I like Nadia’s rubric for measuring success, but I think we can expand it a bit. There’s both “did it come up with the right answers?”, and “how did it get those answers?”. (This is where I always struggled as a kid in school - I wrote the right answers, why should I have to write down my work to show how I got there? 🤣)
Looking at the four agents, here’s how we’d grade their results:
Run 1 (baseline)
Run 2 (+methodology)
Run 3 (+context)
Run 4 (+skills)
Did it fix the problem?
✅
✅
✅
✅
Did it identify the root cause correctly?
⚠️ Proximate only
⚠️ Proximate only
⚠️ Proximate only
✅
Did it investigate beyond the first fix?
❌
✅
✅
✅
Did it check the rest of the codebase?
❌
Partial
✅
✅
Did it produce the correct architectural fix?
❌
❌
❌
✅
The “fix” rating for agents 1-3 is a bit generous given that we know that useSuspenseQuery wasn’t the complete answer. Still, it does address the immediate issue, and if that change was pushed in a PR, the next test run would have had the hydration error and eventually led to implementing the prefetching solution. So, call that a step in the right direction - it’s investigative progress that you as a developer didn’t have to do yourself.
I was impressed to see that just adding a detailed investigative process in the prompt produced better results, faster. I was very impressed to see how much of a difference adding the skills files made! I honestly didn’t know if that would help, but figured it was worth trying for comparison, and it actually did lead to a full and complete architectural fix suggestion.
Bug #3: Next.js Redirect Error
Nadia’s third bug investigation involves a redirect error. Load /users, it does correctly redirect to /users/1 so that non-admins see their own user details. However, there’s a Rendered more hooks than during the previous render React error that shows up.
Nadia’s LLM prompt was again pretty simple:
💡
When I navigate to the /users page, I see a "Something went wrong" page, and in the console I see the following error: [error code]. Investigate why the error happens and how to fix it.
She reported that her LLM completely failed in this case. It suggested a half dozen causes and “believably detailed” potential solutions, all completely wrong.
Nadia had to dig through Next’s Github issues, found some semi-relevant discussions, but no firm fix. Eventually she started ripping out code from her app, and after “multiple days”, found that the <SendAnalyticsData> component was calling a Server Action function inside of a useEffect . Apparently this was actually due to a combination of a server-side redirect, and a Server Action in progress wrapped in Suspense (per this Next.js issue ). That’s incredibly obscure - no wonder her agent couldn’t figure it out!
I created a fresh Replay recording of the bug and set up the same parallel agent sessions as before. This time, the question was: would Replay MCP would enable any of my agent sessions to even come close to describing the real bug and fix? Or would this also be too obscure even with runtime data available?
Bug #3 Agent #0: No Recording
I tried doing one agent session with just access to the source to see what would happen. Interestingly, it did label <SendAnalyticsData> was the issue, but came up with a completely bogus explanation - it claimed it had to do with component position after an early return, and said it should be moved earlier in the return logic. Clearly the source code alone wasn’t sufficient to explain what was going on here.
Bug #3 Agent #1: Basic Instructions
Agent #1 went on an incredibly deep spelunking expedition through Next and React’s internals :)
It started with the usual RecordingOverview tool, which immediately surfaced the error message. It got more details via the ReactExceptions and ConsoleMessages tools, identified the Router component as the source of the thrown error, and dug into React’s behavior with the ReactRenders tool.
After reading all of the relevant app source files, it read the source of Route via the sources tools, saw the hit counts, and jumped from there to Next’s useActionQueue internal hook. It identified all the internal hook calls and came up with 11 hooks, and flagged the line return isThenable(state) ? use(state) : state as being relevant. It then used multiple Logpoint and Evaluate calls to trace more runtime execution inside of React internal methods like updateWorkInProgressHook().
The final recommended fix was to move the redirect into middleware. This would sort of resolve the problem, but only as a bandaid - it doesn’t address the Server Action being called in a useEffect .
Still, it’s incredibly impressive how deep inside React’s internals this agent was able to investigate and how much of the actual behavior it identified in the process!
Total time: 15 minutes
Bug #3 Agent #1 Prompt
💡
You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.
Bug Report
When I navigate to the /users page, I see a "Something went wrong" page briefly flash, and then it redirects to /users/1. In the browser console, I see the following error: "Rendered more hooks than during the previous render."
The redirect itself is expected — the intention is to redirect non-admin users so they can only see their own profile. But the error flash is not expected and needs to go away.
Server logs are clean, so this is a frontend issue.
Recording
The bug has been captured in Replay recording 20da60dd-0831-491c-a7a8-ea7afe1e1af2. Use Replay MCP tools to investigate what's happening in this recording.
Source Code
The app source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't available in the recording's source maps.
Deliverable
Write a detailed bug report with:
Bug Description — What the user sees
Root Cause Analysis — What exactly causes the error, traced through the code
Suggested Fix — Concrete code changes to fix the root cause
Key Evidence Trail — Table of tools used and findings at each step
Bug #3 Agent #1 Solution
💡
Bug Report: "Rendered more hooks than during the previous render" on /users page
1. Bug Description
When navigating to /users, users briefly see a "Something went wrong" error page (rendered by Next.js's DefaultGlobalError component) for approximately 2 seconds before being redirected to /users/1. The browser console shows:
Error: Rendered more hooks than during the previous render.
The redirect itself is intentional — the server component at src/app/users/page.tsx fetches the current user and calls redirect(/users/${user.id}). The error flash is the bug.
2. Root Cause Analysis
This is a bug in Next.js's Router component's interaction with React's use() hook when processing server-side redirects. Here's the exact chain of events:
Step 1: Server Component throws NEXT_REDIRECT
src/app/users/page.tsx calls redirect(), which throws a NEXT_REDIRECT error. This is caught by Next.js's RedirectErrorBoundary, which renders <HandleRedirect>. That component calls router.replace(url) inside a useEffect, dispatching a navigation action.
Step 2: Router's useActionQueue stores a Promise in useState
The navigation action puts a Promise into actionQueue.state. When the Router re-renders, useActionQueue (in use-action-queue.ts) calls use(state) at line 53 where state is the Promise from useState. This causes the render to suspend.
Step 3: Suspended render saves partial hook state
Before reaching use(state), the Router calls these hooks inside useActionQueue:
Total: 4 hooks. Then use(state) suspends. React saves these 4 partial hooks on the work-in-progress fiber.
Step 4: Suspended fiber becomes the alternate
When React retries the render, the suspended fiber (with only 4 hooks) becomes the alternate (the "previous render" that React compares against).
Step 5: use() resolves → dispatcher set incorrectly
When the Promise resolves, use(state) returns a value. React's useThenable function (react-dom-client line 7004-7013) checks whether to set the Mount or Update dispatcher for subsequent hooks:
javascript
// Simplified from React sourceif(workInProgressHook.next ===null){// No more WIP hooks after use()if(alternate !==null&& alternate.memoizedState !==null){// Alternate has hooks → use UPDATE dispatcher
ReactSharedInternals.H= HooksDispatcherOnUpdateInDEV;}else{// No alternate hooks → use MOUNT dispatcher
ReactSharedInternals.H= HooksDispatcherOnMountInDEV;}}
The check alternate.memoizedState !== null is too coarse. The alternate does have hooks (its memoizedState is non-null — it has 4 hooks), but all 4 of those hooks have already been consumed during the current render. There are zero remaining hooks to update against. React should set the Mount dispatcher here, but instead sets the Update dispatcher.
Step 6: Next hook call throws
The Router's useMemo at line 207 (immediately after useActionQueue returns) calls updateWorkInProgressHook (the Update-mode hook handler), which does:
javascript
nextCurrentHook = currentHook.next;// null — no more alternate hooks!if(nextCurrentHook ===null){throwError("Rendered more hooks than during the previous render.");}
Step 7: Error boundary catches, shows error page
ErrorBoundaryHandler catches this error and renders DefaultGlobalError — the "Something went wrong" page the user sees. The redirect eventually completes via a separate code path, navigating to /users/1.
3. Suggested Fix
The root cause is in React's useThenable implementation, making it a React/Next.js framework bug. However, the app can work around it.
App-level fix — move redirect to middleware:
Create src/middleware.ts:
typescript
import{ NextRequest, NextResponse }from'next/server';exportasyncfunctionmiddleware(request: NextRequest){if(request.nextUrl.pathname ==='/users'){// In a real app, read the user ID from session/cookieconst userId =1;// or fetch from authreturn NextResponse.redirect(newURL(`/users/${userId}`, request.url));}}exportconst config ={matcher:'/users',};
This performs the redirect at the HTTP level (301/302) before any React rendering occurs, completely avoiding the client-side Router render cycle that triggers the bug.
Alternative app-level fix — use permanentRedirect or notFound pattern:
Replace the server component redirect with a client-side approach:
typescript
// src/app/users/page.tsximport{ redirect }from'next/navigation';import{ getUser }from'@/services/user';exportdefaultasyncfunctionUsersPage(){const user =awaitgetUser();// Use permanentRedirect for HTTP-level redirectredirect(`/users/${user.id}`);}
If the above still triggers the same issue (since both redirect and permanentRedirect use the same NEXT_REDIRECT mechanism), the middleware approach is the most reliable workaround.
Framework-level fix (in React's useThenable):
The check at react-dom-client.development.js line ~7004 should verify whether the alternate has unconsumed hooks remaining, not merely whether it has any hooks at all:
javascript
// Current (buggy):if(alternate !==null&& alternate.memoizedState !==null){// Fixed — check if currentHook (the tracking pointer into the alternate)// has a .next, meaning there are unconsumed hooks:if(currentHook !==null&& currentHook.next !==null){
This would correctly detect that all alternate hooks have been consumed and set the Mount dispatcher, allowing subsequent hooks (useMemo, etc.) to mount fresh rather than trying to update against non-existent previous hooks.
4. Key Evidence Trail
Step
Tool
Finding
1
RecordingOverview
1 console error at 1050ms: "Rendered more hooks than during the previous render"
2
ReactRenders (summary)
4 commits; commit 1 at 1.05s renders DefaultGlobalError — the error flash
3
ReactException
Two errors: NEXT_REDIRECT (caught by RedirectErrorBoundary) and hooks error (caught by ErrorBoundaryHandler)
4
ConsoleMessages (detail)
Error stack points to updateWorkInProgressHook in react-dom-client.development.js:6979
5
ReadSource (app-router.tsx)
Router component has 13+ hooks; useActionQueue is first, calls use(state) conditionally
6
ReadSource (use-action-queue.ts)
Line 53: return isThenable(state) ? use(state) : state — conditional use() call
7
Logpoint (use-action-queue.ts:29)
Confirmed state from useState was a thenable (Promise) during renders at 999ms-1042ms
8
Logpoint (app-router.tsx:207)
useMemo NOT reached during suspended renders (999ms, 1007ms, 1041ms) — only 4 hooks saved
9
Logpoint (app-router.tsx:439)
At 1042ms: useMemo IS reached but line 439 is NOT — render throws before completing
10
Evaluate (at useThenable Point:130)
workInProgressHook.next === null, alternate.memoizedState !== null (4 hooks), all consumed — dispatcher set to Update incorrectly
11
DescribePoint (Point:114)
Confirmed alternate fiber had exactly 4 hooks in memoizedState chain at the moment of the throw
Server component calling redirect() — the trigger for the entire chain
Bug #3 Agent #2: Investigative Methodology
Agent #2 was given the same 8-step investigative approach we used earlier with Bug #2.
And Agent #2 actually nailed the exact React behavior and suggested fix!
Agent #2 started with RecordingOverview, saw the Next redirect error, and took a few screenshots to confirm visuals. It then used ReactRenders to dig into the specific React commits around the redirect.
After using SearchSources and ReadSources to read the full Router component and its hit counts, it used ReactRenders again to get the specific React fiber ID for the Router instance. It identified the render cause as a setState call, then used DescribePoint to backtrack the cause of the setState.
As with Agent #1, it traced down into useActionQueue, then read much of the app source. This time, though, it found <SendAnalyticsData>, and immediately flagged sendAnalyticsData as a Server Action. It used a mixture of source reads + hit counts, logpoints, and dependency graph checks to confirm the timing of the calls, and concluded the server response came back while the Router was still mid-redirect.
The suggested fix was indeed replacing the Server Action with a simple fetch call for the analytics request.
Total time: 12 minutes
Bug #3 Agent #2 Prompt
💡
You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.
Bug Report
When I navigate to the /users page, I see a "Something went wrong" page briefly flash, and then it redirects to /users/1. In the browser console, I see the following error: "Rendered more hooks than during the previous render."
The redirect itself is expected — the intention is to redirect non-admin users so they can only see their own profile. But the error flash is not expected and needs to go away.
Server logs are clean, so this is a frontend issue.
Recording
The bug has been captured in Replay recording 20da60dd-0831-491c-a7a8-ea7afe1e1af2. Use Replay MCP tools to investigate what's happening in this recording.
Source Code
The app source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't available in the recording's source maps.
Investigation Methodology
Follow this structured 8-step investigation process:
Establish user experience — Take screenshots at key moments to see exactly what the user sees. Build a visual timeline of the bug.
Map data flow — Identify all network requests, their timing, and what components consume the responses. Look for waterfalls or unexpected ordering.
Build render timeline — Use React analysis tools to see the sequence of component renders, mounts, unmounts. Look for unexpected render patterns, error boundaries catching, Suspense transitions.
Compare scenarios explicitly — The bug says "navigating to /users shows an error flash then redirects." Compare what happens during this navigation vs what happens if you directly load /users/1. What's different?
Find proximate cause, then go deeper — Once you find what immediately causes the error, ask "but WHY does this happen?" Keep going deeper. The first cause you find is rarely the root cause. Evaluate whether your fix addresses just the symptom or the actual root cause.
Systematic elimination — If the error source isn't clear from tracing, try identifying which components are on the page and reasoning about which one might be causing the issue. The "hooks" error means a component is rendering with a different number of hooks between renders — which component could be conditionally calling hooks? Look at layout components, providers, and components that appear on multiple pages.
Consider framework idioms — Is the code using Next.js/React/TanStack patterns correctly? Are there known interactions between features (Suspense, redirects, Server Actions, etc.) that could cause this?
Red-team your conclusion — Before finalizing, critically evaluate: Does your fix actually address the root cause? Could it introduce new problems? Are there other components or pages with the same pattern?
Deliverable
Write a detailed bug report with:
Bug Description — What the user sees
Root Cause Analysis — What exactly causes the error, traced through the code
Suggested Fix — Concrete code changes to fix the root cause
Key Evidence Trail — Table of tools used and findings at each step
Bug #3 Agent #2 Solution
💡
Bug Report: "Something went wrong" flash on /users redirect
1. Bug Description
When navigating to /users, a "Something went wrong" error page briefly flashes (~50ms, visible from ~1015ms to ~3050ms in the recording) before the expected redirect to /users/1 completes. The console logs: "Rendered more hooks than during the previous render." The error is caught by Next.js's global error boundary (ErrorBoundaryHandler), which renders global-error.tsx (the "Something went wrong" page). The redirect eventually succeeds, but the UX flash is jarring.
2. Root Cause Analysis
The chain of events:
User navigates to /users. The server component UsersPage (/app/users/page.tsx) calls await getUser() then redirect('/users/${user.id}'), which throws a NEXT_REDIRECT error.
Commit #0 (~1.00s): Next.js's RedirectErrorBoundary catches the NEXT_REDIRECT. It renders HandleRedirect, which calls router.replace('/users/1') inside a useEffect + React.startTransition. This commit also mounts the layout tree, including SendAnalyticsData components.
Layout effects fire after commit #0:
SendAnalyticsData with value="root-layout" (from /app/layout.tsx) fires its useEffect, calling the server action sendAnalyticsData("root-layout")
SendAnalyticsData with value="users-layout" (from UserLayout.tsx via /app/users/layout.tsx) fires its useEffect, calling sendAnalyticsData("users-layout")
Server actions trigger Router state updates: Each server action call goes through Next.js's callServer → dispatchAppRouterAction → actionQueue.dispatch() → setState(deferredPromise) on the Router's internal state. This sets the Router's useActionQueue state to a Promise (thenable).
The HandleRedirect effect also fires, calling router.replace('/users/1') which dispatches a navigate action — another setState(promise) on the Router.
Commit #1 (~1.05s): The Router re-renders. Now its useActionQueue hook encounters isThenable(state) === true, so it calls use(state) (React 19's use() API). The use() call suspends because the promise is still pending. However, the interaction between:
The error boundary recovery from the NEXT_REDIRECT
...causes React to see a different number of hooks between the Router's previous render and this one. The Router throws "Rendered more hooks than during the previous render." This is caught by the top-level ErrorBoundaryHandler, which renders DefaultGlobalError (the "Something went wrong" page from global-error.tsx).
Commit #2 (~3.05s): The promises resolve, Suspense resumes, the redirect to /users/1 completes, and the correct page renders.
The root cause is the SendAnalyticsData components calling server actions during the same render cycle as a redirect. The server actions dispatch state updates to the Router's action queue while the redirect is also dispatching a navigate action. This concurrent mutation of the Router's state during error boundary recovery triggers the hooks mismatch error.
Specifically:
SendAnalyticsData in the root layout (/app/layout.tsx:24) — fires on every page load
SendAnalyticsData in the users layout (UserLayout.tsx:6) — fires when /users/* loads
Both call sendAnalyticsData() (a server action) from useEffect, which pushes promises into the Router's action queue at exactly the wrong time — while a NEXT_REDIRECT is being processed.
3. Suggested Fix
Option A (Recommended): Don't use server actions for analytics — use a regular API call or fire-and-forget fetch instead.
Server actions in Next.js are tightly coupled to the Router's action queue. Using them for fire-and-forget analytics is an anti-pattern because they trigger Router re-renders. Replace with a plain fetch:
typescript
// src/actions/analytics.ts — convert from server action to API utility// REMOVE: 'use server';exportasyncfunctionsendAnalyticsData(name: string){// Use a regular fetch to an API route, or just log client-sideawaitfetch('/api/analytics',{method:'POST',body:JSON.stringify({ name }),});}
Or simpler — just make the analytics call not go through the server action mechanism:
typescript
// src/components/SetupAnalytics.tsx'use client';import{ useEffect }from'react';// Don't import a server action — use a plain functionfunctionsendAnalyticsBeacon(value: string){// navigator.sendBeacon, fetch to an API route, or third-party SDK
console.log('analytics data sent', value);}exportfunctionSendAnalyticsData({ value }:{value: string }){useEffect(()=>{sendAnalyticsBeacon(value);},[value]);return<></>;}
Option B (Alternative): Wrap the server action call in startTransition with lower priority.
This doesn't fix the fundamental issue but may work around the timing:
However, Option B is fragile — it still couples analytics to the Router internals and may break again with future Next.js changes. Option A is the correct fix.
Option A is recommended because:
Analytics should be fire-and-forget, not coupled to the Router's action queue
Server actions are designed for mutations that need to update the UI (form submissions, data writes), not for side-effect-only telemetry
This eliminates the entire class of bugs where analytics firing causes Router state conflicts
4. Key Evidence Trail
Step
Tool
Finding
1. Visual timeline
Screenshot at 723ms, 1015ms, 1077ms, 4595ms
Confirmed "Something went wrong" page flashes between ~1015ms and ~3050ms
2. Error identification
RecordingOverview
Two error boundary catches: NEXT_REDIRECT at 1005ms, hooks error at 1050ms
ReadSource on app-call-server.ts, app-router-instance.ts
Server actions → dispatchAppRouterAction → setState(promise) on Router
8. Trigger identification
ReactRenders (trigger-detail, fiber-cause)
Commit #1 triggered by setState from effects fired after commit #0
9. App source review
Source files on disk
SendAnalyticsData calls server action from useEffect, present in root layout + users layout
10. Timing correlation
Logpoint on sendAnalyticsData calls
"root-layout" fires at 972ms, "users-layout" fires at 1005ms — same window as NEXT_REDIRECT
Bug #3 Agent #3: App Context and Skills
Agent #3’s instructions added some basic context about the app setup, and access to the same React / Next / TanStack skill files we used earlier.
And yet it did worse than Agent #2!
It started by reading the skills files, checked RecordingOverview and ReactRenders and some screenshots to establish a timeline, and read through the app sources. It did read <SendAnalyticsData> and looked for places that component was used.
However, it then got lost in the weeds of Router, useActionQueue, and use hook semantics, down to investigating whether use adds a new hook entry to the rendering fiber. It did at least start to question whether this bug was at the app level or internals level :)
Ultimately, its solution described the sequence of events fairly well, but its conclusion was to move the redirect into middleware, which isn’t correct.
This one’s a bit puzzling - maybe the extra skills gave it too much info on Next and React? :)
Total time: 16 minutes
Bug #3 Agent #3 Prompt
💡
You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.
Bug Report
When I navigate to the /users page, I see a "Something went wrong" page briefly flash, and then it redirects to /users/1. In the browser console, I see the following error: "Rendered more hooks than during the previous render."
The redirect itself is expected — the intention is to redirect non-admin users so they can only see their own profile. But the error flash is not expected and needs to go away.
Server logs are clean, so this is a frontend issue.
Recording
The bug has been captured in Replay recording 20da60dd-0831-491c-a7a8-ea7afe1e1af2. Use Replay MCP tools to investigate what's happening in this recording.
Source Code
The app source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't available in the recording's source maps.
Investigation Methodology
Follow this structured 8-step investigation process:
Establish user experience — Take screenshots at key moments to see exactly what the user sees. Build a visual timeline of the bug.
Map data flow — Identify all network requests, their timing, and what components consume the responses. Look for waterfalls or unexpected ordering.
Build render timeline — Use React analysis tools to see the sequence of component renders, mounts, unmounts. Look for unexpected render patterns, error boundaries catching, Suspense transitions.
Compare scenarios explicitly — The bug says "navigating to /users shows an error flash then redirects." Compare what happens during this navigation vs what happens if you directly load /users/1. What's different?
Find proximate cause, then go deeper — Once you find what immediately causes the error, ask "but WHY does this happen?" Keep going deeper. The first cause you find is rarely the root cause. Evaluate whether your fix addresses just the symptom or the actual root cause.
Systematic elimination — If the error source isn't clear from tracing, try identifying which components are on the page and reasoning about which one might be causing the issue. The "hooks" error means a component is rendering with a different number of hooks between renders — which component could be conditionally calling hooks? Look at layout components, providers, and components that appear on multiple pages.
Consider framework idioms — Is the code using Next.js/React/TanStack patterns correctly? Are there known interactions between features (Suspense, redirects, Server Actions, etc.) that could cause this?
Red-team your conclusion — Before finalizing, critically evaluate: Does your fix actually address the root cause? Could it introduce new problems? Are there other components or pages with the same pattern?
Architecture Context
The following facts about the app's architecture are known from static analysis:
Framework: Next.js 14+ with App Router
Data fetching: TanStack Query for client-side data management
Component model: Mix of React Server Components (RSC) and Client Components ('use client')
Route structure: Uses loading.tsx files for Suspense fallbacks, layout.tsx for nested layouts
Validation: Zod schemas for API response validation
Layouts: Root layout (app/layout.tsx) and a nested users layout (app/users/layout.tsx) — both contain shared components
Redirect pattern: The /users route uses Next.js redirect() to send users to /users/[id]
Components in layouts: Both the root layout and users layout include utility components like analytics trackers
Technology Skills
Before investigating, read the following skill files for relevant framework patterns. Scan the headers first, then read sections that seem relevant to the bug (hooks error + redirect + error boundary):
~/projects/temp/react-skills/skills/next-data-patterns.md — Next.js App Router data fetching patterns
~/projects/temp/react-skills/skills/next-suspense-boundaries.md — Suspense boundary and streaming rules in Next.js
~/projects/temp/react-skills/skills/tanstack-query.md — TanStack Query patterns including Suspense integration
Deliverable
Write a detailed bug report with:
Bug Description — What the user sees
Root Cause Analysis — What exactly causes the error, traced through the code
Suggested Fix — Concrete code changes to fix the root cause
Key Evidence Trail — Table of tools used and findings at each step
Bug #3 Agent #3 Solution
💡
Bug Report: "Rendered more hooks than during the previous render" on /users redirect
1. Bug Description
What the user sees:
When navigating to /users, the page briefly shows a loading skeleton (~400ms), then flashes a full-page error ("Application error: a client-side exception has occurred") for ~1.5 seconds, then recovers and loads /users/1 successfully.
The redirect to /users/1 is intentional behavior (non-admin users should only see their own profile). The error flash is the bug.
Timeline from recording 20da60dd-0831-491c-a7a8-ea7afe1e1af2:
Time
What's visible
661ms
Loading skeleton (Suspense fallback)
1077ms
Full-page error: "Application error: a client-side exception..."
2561ms
Loading skeleton again (redirect target loading)
4595ms
User profile page for John Doe — success
2. Root Cause Analysis
The proximate cause
The error "Rendered more hooks than during the previous render" is thrown by React's hook reconciliation system inside the Next.js Router component (app-router.tsx:207). It is caught by ErrorBoundaryHandler and displayed as a full-page error via DefaultGlobalError.
src/app/users/page.tsx is a Server Component that calls Next.js redirect():
typescript
exportdefaultasyncfunctionUsersPage(){const user =awaitgetUser();redirect(`/users/${user.id}`);}
This throws a NEXT_REDIRECT error that's caught by RedirectErrorBoundary, which renders HandleRedirect. HandleRedirect's useEffect dispatches a router.replace() action to the Router's action queue.
Step 2: Router state becomes a thenable
The router.replace() action causes the Router's state (managed by useActionQueue) to become a thenable (a Promise-like object representing the pending navigation). Inside useActionQueue:
javascript
returnisThenable(state)?use(state): state;
When use(state) is called on a pending thenable, it suspends the Router component. During suspension, React records the hooks executed so far:
Step 3: Thenable resolves, but isThenable() still returns true
The navigation thenable resolves externally. On the next render, the state object is the same thenable — now resolved with {status, value}. But isThenable() only checks typeof promise.then === 'function', and a resolved thenable still has a .then method. So use(state) is called again.
Step 4: use() returns synchronously — no suspension
use() on an already-resolved thenable returns the resolved value immediately without suspending. This means useActionQueue returns normally, and the Router continues executing past it.
Step 5: Next hook triggers the error
The Router's next statement after useActionQueue is useMemo at line 207 (hook #4). React's updateWorkInProgressHook compares: the previous fiber (from the suspended render) had 3 hooks, but the current render is now requesting hook #4. React throws:
"Rendered more hooks than during the previous render."
Why this is fundamentally a timing/framework issue
The bug occurs because React's Suspense mechanism should normally re-render a subtree from scratch after suspension resolves — meaning the suspended fiber's hook count shouldn't be the baseline. However, the specific interaction between:
Next.js's useActionQueue calling use() on a resolved thenable
The dev-only useAppDevRenderingIndicator adding 2 extra hooks (increasing the hook count gap from 1→2 to 3→4+)
React's hook reconciler comparing against the suspended fiber's hooks
...creates a race condition where the hook count mismatch manifests as a user-visible error.
In production, useActionQueue has only 1 hook (useState). The same mechanism could theoretically apply (1 hook → 2 hooks), but production builds may handle the Suspense fiber lifecycle differently, or the timing may prevent the race.
The app's contribution to the bug
The app uses redirect() inside a Server Component page — a documented Next.js pattern. However, this pattern pushes the redirect through the full React rendering pipeline (RSC → Error Boundary → HandleRedirect → Router state update → suspension → resolution → crash). A middleware-level redirect would bypass React entirely.
3. Suggested Fix
Primary fix: Move redirect to middleware
Replace the Server Component redirect with a Next.js middleware redirect. This avoids React rendering entirely:
Create src/middleware.ts:
typescript
import{ NextResponse }from'next/server';import type { NextRequest }from'next/server';exportfunctionmiddleware(request: NextRequest){// Redirect /users to /users/1 (non-admin users see only their profile)// In a real app, get user ID from session/auth tokenif(request.nextUrl.pathname ==='/users'){const userId =1;// TODO: extract from auth sessionreturn NextResponse.redirect(newURL(`/users/${userId}`, request.url));}}exportconst config ={matcher:'/users',};
Simplify src/app/users/page.tsx:
typescript
// This page should never render — middleware redirects /users to /users/[id]// Keep as a fallback in case middleware is bypassedimport{ redirect }from'next/navigation';import{ getUser }from'@/services/user';exportdefaultasyncfunctionUsersPage(){const user =awaitgetUser();redirect(`/users/${user.id}`);}
Alternative fix: Add a loading.tsx at the /users level
If the middleware approach isn't suitable, adding a src/app/users/loading.tsx file would provide a proper Suspense boundary that might alter the suspension/resolution timing enough to avoid the race condition. However, this is a mitigation, not a fix — it papers over the framework-level timing issue:
This appears to be a bug in Next.js's useActionQueue (or possibly in React's Suspense hook reconciliation). The useAppDevRenderingIndicator hook being called conditionally based on NODE_ENV with an eslint-disable-next-line react-hooks/rules-of-hooks comment is a red flag — it means the hook count differs between dev and production, making dev more susceptible to this class of bugs. This could be worth reporting to the Next.js team.
Confirmed redirect in Server Component, no middleware, no loading.tsx at /users level.
12. Error mechanism
grepika_get on react-dom updateWorkInProgressHook
Confirmed: null === nextCurrentHook → "Rendered more hooks than during the previous render" when fiber.alternate has fewer hooks.
Bug #3: Initial Results
After three sessions, I had mixed results for this bug. The good news was that the Replay recording was enough for Agent #2 to actually find the real no-kidding issue! That’s a huge step forward over Nadia’s results. Still, I wanted to see if there was anything else I could do to improve the analysis here.
Let’s say this was a purpose-built debugging agent, running against a known CI failure. In that case, we’d be able to provide some additional context to describe the app and it’s structure. That might be useful.
Beyond that, the two agents that didn’t get the answer spent too much time diving down into the internals of Next and React. There was useful information there, but that’s also a rabbit hole that trapped them. What if we gave directions to limit any internals investigation to “20%” effort, and focus 80% on the app itself? Would that curb the tendencies to dive too far?
Bug #3 Agent #4: App Context and 80/20 Guidance
I kicked off Agent #4 with a key update to the previous instructions:
💡
Follow this structured investigation process. Budget your time: spend ~80% on app code, component tree, effects, and state flow. Spend at most ~20% reading framework internals. If you find yourself multiple levels deep in Next.js router source or React reconciler code, stop and ask: "Do I already know enough to identify which app component is the trigger?"
I also refined the investigative process steps to encourage it to focus on actionable results, and to lead the report with a focus on the trigger.
As usual, Agent #4 started by calling RecordingOverview , saw the redirect error, checked some screenshots, used ReactRenders to check some of the render causes, and also read ReactExceptions to get more details on the error.
However, it then focused on reading the source code for the components… and immediately identified that <SendAnalyticsData> was calling a Server Action inside of a useEffect!
From there, it dug further into the dependency graph triggers for the renders. Then it examined the network requests, found the analytics POSTs, and confirmed the timing compared to the redirect error. It did start to dive into React, but actually stopped and explicitly said “I’ve been spending too much time on React internals - let me step back and focus on the app-level mechanism”. Clearly that part of the instructions worked :)
The final solution and fix were exactly right! It pointed to the Server Action in a useEffect running while Next was mid-redirect, gave the interleaved timeline of events, and specifically recommended using fetch or navigator beacons instead of a Server Action.
Total time: 10 minutes
Bug #3 Agent #4 Prompt
💡
You are a debugging agent with access to Replay MCP tools and source code. Your job is to investigate a bug, identify the root cause, and suggest a fix.
Bug Report
When I navigate to the /users page, I see a "Something went wrong" page briefly flash, and then it redirects to /users/1. In the browser console, I see the following error: "Rendered more hooks than during the previous render."
The redirect itself is expected — the intention is to redirect non-admin users so they can only see their own profile. But the error flash is not expected and needs to go away.
Server logs are clean, so this is a frontend issue.
Recording
The bug has been captured in Replay recording 20da60dd-0831-491c-a7a8-ea7afe1e1af2. Use Replay MCP tools to investigate what's happening in this recording.
Source Code
The app source code is available at ~/projects/temp/debugging-with-ai if you need to read files that aren't available in the recording's source maps.
Investigation Methodology
Follow this structured investigation process. Budget your time: spend ~80% on app code, component tree, effects, and state flow. Spend at most ~20% reading framework internals. If you find yourself multiple levels deep in Next.js router source or React reconciler code, stop and ask: "Do I already know enough to identify which app component is the trigger?"
Step 1: Establish user experience
Take screenshots at key moments to see exactly what the user sees. Build a visual timeline of the bug.
Step 2: Map data flow
Identify all network requests, their timing, and what components consume the responses. Look for waterfalls or unexpected ordering.
Step 3: Build render timeline
Use React analysis tools to see the sequence of component renders, mounts, unmounts, and effects. Look for unexpected render patterns, error boundaries catching, Suspense transitions. Pay special attention to what effects fire and what state changes they trigger — effects that kick off async work (fetches, server calls, state updates) during error recovery or redirect handling are common sources of timing bugs.
Step 4: Diff from working
Compare what happens on the broken route (/users → error flash → redirect) vs a working route (directly loading /users/1). What components, effects, or actions fire on the broken path that don't fire on the working path? That delta is where the trigger lives. Use React render commits, effect traces, and network requests to build both timelines and compare them.
Step 5: Find the trigger, then explain the mechanism
Bugs often have two layers:
The trigger: What in the app's code causes this to happen? Which component, which hook, which effect, which function call?
The mechanism: Why does the framework/runtime react badly to that trigger?
Find the trigger first. Identify the specific app-level code whose presence or behavior causes the error. Then briefly explain the framework mechanism — but don't get lost in framework internals before you've identified the app-level trigger.
Step 6: Systematic elimination
If the trigger isn't clear from tracing, reason about which components are unique to the broken path. The "hooks" error means a component is rendering with a different number of hooks between renders. Look at:
Layout components that wrap the route
Providers and context components
Utility components (analytics, tracking, feature flags) that appear in layouts
Components that fire effects during mount
Which of these could be interacting with the redirect/error recovery in a way that changes the hook count?
Step 7: Consider framework idioms
Is the code using Next.js/React patterns correctly? Are there known problematic interactions between features (Suspense + redirects, Server Actions + error boundaries, effects during error recovery, etc.)?
Important: If the app code is using a documented framework pattern correctly and the error is caused by a framework-internal mechanism, say so explicitly. But still identify which specific app code triggers the framework bug and propose an app-level change that avoids the trigger.
Step 8: Red-team your fix
Before finalizing, critically evaluate:
Actionable fix test: If your fix is "move to middleware," "change the framework," or "restructure routing" — keep looking. There's probably an app-level change that removes the trigger without restructuring. Framework workarounds are plan B, not plan A.
Does your fix address the root trigger, or just avoid the code path?
Could your fix introduce new problems?
Are there other components or pages with the same pattern?
Architecture Context
The following facts about the app's architecture are known from static analysis:
Framework: Next.js 14+ with App Router
Data fetching: TanStack Query for client-side data management
Component model: Mix of React Server Components (RSC) and Client Components ('use client')
Route structure: Uses loading.tsx files for Suspense fallbacks, layout.tsx for nested layouts
Validation: Zod schemas for API response validation
Layouts: Root layout (app/layout.tsx) and a nested users layout (app/users/layout.tsx) — both contain shared components
Redirect pattern: The /users route uses Next.js redirect() to send users to /users/[id]
Components in layouts: Both the root layout and users layout include utility components like analytics trackers
Technology Skills
Before investigating, read the following skill files for relevant framework patterns. Scan the headers first, then read sections that seem relevant to the bug (hooks error + redirect + error boundary):
~/projects/temp/react-skills/skills/next-data-patterns.md — Next.js App Router data fetching patterns
~/projects/temp/react-skills/skills/next-suspense-boundaries.md — Suspense boundary and streaming rules in Next.js
~/projects/temp/react-skills/skills/tanstack-query.md — TanStack Query patterns including Suspense integration
Deliverable
Write a detailed bug report with:
Bug Description — What the user sees
Root Cause Analysis — Structure as: Trigger (app-level code) then Mechanism (framework behavior). Lead with the trigger.
Suggested Fix — Concrete code changes targeting the app-level trigger. If a framework workaround is also needed, present it as a secondary option.
Key Evidence Trail — Table of tools used and findings at each step
Bug #3 Agent #4 Solution
💡
Bug Report: "Rendered more hooks than during the previous render" on /users redirect
1. Bug Description
When navigating to /users, the user sees a "Something went wrong" error page flash for ~2 seconds before being redirected to /users/1. The console logs: "Rendered more hooks than during the previous render."
Visual timeline:
Time
What the user sees
895ms
Blank/loading state
1015ms
Loading state transitioning
1077ms
"Something went wrong" error flash (DefaultGlobalError)
~3000ms
Navigation to /users/1 begins
4595ms
User profile page renders correctly
2. Root Cause Analysis
Trigger (app-level code)
The SendAnalyticsData component in src/components/SetupAnalytics.tsx calls a Server Action (sendAnalyticsData) inside a useEffect on mount. This component is rendered in two layouts:
When /users loads, the Server Component src/app/users/page.tsx calls redirect('/users/1'), which throws a NEXT_REDIRECT error. Next.js catches this via RedirectErrorBoundary. However, the layout components above the page still mount and commit during this render — including both SendAnalyticsData instances.
Their useEffects fire immediately after commit, calling the sendAnalyticsData Server Action. These Server Action calls return RSC flight data that triggers a setState on the Next.js Router component — while the Router is already mid-redirect.
Mechanism (framework behavior)
The sequence is:
Commit #0 (1.00s):/users page renders. NEXT_REDIRECT is thrown and caught by RedirectErrorBoundary. Both SendAnalyticsData components mount in the layout tree above the error boundary.
Effects fire (1.005–1.008s): Both SendAnalyticsDatauseEffects execute, calling sendAnalyticsData() Server Action. This produces two POST requests (to /users and /users/1).
Server Action responses arrive (~1.05s): The responses contain RSC flight data. Next.js processes this and calls setState on the Router component to update the RSC tree.
Commit #1 (1.05s): The Router re-renders, but it is now in an inconsistent state — the redirect has partially torn down the component tree, and the Router's hook count changes between renders. React throws "Rendered more hooks than during the previous render".
ErrorBoundaryHandler catches this, rendering DefaultGlobalError — the error flash the user sees.
Commit #2 (3.05s): The redirect to /users/1 eventually completes, the full tree re-renders, and the app recovers.
In short: Calling a Server Action from a useEffect in a layout that wraps a redirecting page causes an RSC state update that corrupts the Router during redirect processing.
3. Suggested Fix
Primary fix — don't call Server Actions from layout effects during redirect-sensitive renders.
The simplest fix is to replace the Server Action call with a plain fetch() (or navigator.sendBeacon()) for analytics. This avoids the RSC flight data response that corrupts the Router state:
typescript
// src/components/SetupAnalytics.tsx'use client';import{ useEffect }from'react';exportfunctionSendAnalyticsData({ value }:{value: string }){useEffect(()=>{// Use fetch() or sendBeacon() instead of a Server Action.// Server Actions return RSC flight data that can interfere// with the Next.js Router during redirect processing.fetch('/api/analytics',{method:'POST',body:JSON.stringify({name: value }),keepalive:true,// ensures delivery even on navigation});},[value]);return<></>;}
With a corresponding API route:
typescript
// src/app/api/analytics/route.tsimport{ NextResponse }from'next/server';exportasyncfunctionPOST(request: Request){const{ name }=await request.json();
console.log('analytics data sent', name);return NextResponse.json({ok:true});}
Why this works: A plain fetch() response is just JSON — it doesn't carry RSC flight data and doesn't trigger setState on the Router. The analytics fires the same way, but the response can't corrupt React's component tree.
Alternative fix — if you want to keep using a Server Action, you can suppress the return value by wrapping it so it doesn't trigger Router updates:
Two POST Server Action calls at 1008-1010ms with next-action headers. Bodies: ["root-layout"], ["users-layout"]
Effect execution timing
replay_Logpoint on SetupAnalytics.tsx:9
5 hits total. Hits at 972ms ("root-layout") and 1005ms ("users-layout") fire during redirect processing window
Dependency chain analysis
replay_DescribePoint with dependencyChain: true
Confirmed: effect from commit #0 → Server Action call → setState on Router → hooks error
Source code review
File reads of layout.tsx, UserLayout.tsx, SetupAnalytics.tsx, analytics.ts
SendAnalyticsData in both root and users layouts calls Server Action in useEffect
5. Why Other Components Aren't the Problem
The SendAnalyticsData in UserPage.tsx:221 is not part of this bug — UserPage only renders after the redirect completes (commit #3 at 3.89s). The problem is specifically the two instances in layout components that persist across the redirect boundary — they mount during commit #0 when the redirect is being processed, and their effects fire Server Actions whose responses corrupt the Router.
Bug #3 Conclusions
This one was fascinating! Let’s see how the results looked using that updated comparison rubric:
Run 1 (baseline)
Run 2 (+methodology)
Run 3 (+ctx+skills)
Run 4 (best CI)
Did it fix the problem?
❌
✅
❌
✅
Did it identify the root cause correctly?
❌
✅
❌
✅
Did it investigate beyond the first fix?
⚠️ Wrong direction
✅
⚠️ Wrong direction
✅
Did it check the rest of the codebase?
❌
✅
Partial
✅
Did it produce the correct architectural fix?
❌
✅
❌
✅
With Bug #2, we saw a pretty consistent stairstep of improvements as we added more context and skills. Here, the agent either nailed everything or got lost in the weeds.
Agent #3 is still the one that confuses me the most. It had the same process instructions as #2, but with added app context and React skills. Somehow that led it off into the guts of React and it never actually figured out the app-level issue.
I was very pleased to see that the instructions to focus on app concerns had the desired results! In fact it was hilarious to read the LLM actually saying “oops I’m going too deep, back up!”
Overall Results
Nadia’s LLM runs got partial credit on Bug #2 (the double loading skeleton bug), and failed completely on Bug #3 (the broken redirect + Server Actions call).
By adding Replay recordings and improving the prompts with better investigative instructions, I saw my agents go from a partial fix of Bug #2 to nailing the exact cause and solution, and two of the four runs for Bug #3 also come up with the exact cause and solution as well!
This was very exciting! Of course I hoped that adding Replay would produce better results - that’s why I tried doing this in the first place. But I had no way of knowing for sure without actually running the experiment and seeing what happened.
Replay MCP’s Tools Were Critical
In all these explorations, the agents heavily used Replay MCP to understand both the app runtime behavior and Next + React’s internals. They consistently oriented themselves with RecordingOverview to surface app details, errors, and renders; used Screenshot to visualize what the app looked like at various points; and dug into behavior details with ReactRenders and NetworkRequests.
For the deep-dive exploration, the agents used SearchSources to find relevant files and ReadSource to view sources with hit counts, Then they dug deeper with numerous Logpoint and Evaluate calls to dynamically introspect behavior at key lines and hits.
As we saw from Nadia’s post and a couple of my example agent runs, just having the source code available wasn’t enough to solve these issues! Reading the app source was important, and it helped the agent understand the structure of the app codebase and find things like the Server Action. Both parts were necessary. But without the runtime data from the Replay recording, the agents could never have understood the full picture of the apps behavior, and wouldn’t have been able to nail the results.
Prompting and Context Still Matter
As I talked about in the previous post on Dan’s useDeferredValue bug: I really was surprised just how much the prompt and instructions mattered here, and how much of a difference they made!
I’m sure there’s plenty of academic studies out there that talk about all these aspects in more scientific detail. I’m just one dev who tried running some experiments locally :) But this really has impressed upon me the importance of providing good context and instructions for any autonomous agent investigative work. (And, for that matter, helps explain why my own personal hands-on agentic development workflow seems to produce good results for me.)
Replay MCP: The Time-Travel Superpower Your Agents Need
As I said in the last post: I joined Replay because I saw the potential and promise of time-travel debugging and how it can make the debugging process drastically faster and better. It enables solving bugs that are otherwise impossible, and provides insights into real runtime behavior that no other monitoring or analysis tool can provide.
Replay DevTools gave humans the ability to investigate and solve bugs with time travel.
Now, Replay MCP gives your agents the same time-travel superpowers.
And, even better: we’ve updated our Replay Github bot to automatically trigger time-travel analysis on failed E2E tests, and provide root-cause and a recommended fix as a comment on your PR! This will help your team fix flaky tests, ensure PR code quality, and improve shipping velocity. One step closer to self-healing CI/CD!
Try out Replay MCP and our E2E Test Suites integrations today, and stay tuned - we’ve got a lot more awesome time-travel debugging capabilities coming soon!