We want autonomous AI developers to be effective at making straightforward fixes and minor improvements to web applications. Doing these tasks reliably with minimal guidance will make it easier and faster to build and maintain frontends by supervising these AIs.
We compared the ability of several AI developers – OpenHands, Copilot Workspace, Devin, and Amazon Q – to make a particular improvement to Replay’s devtools. We found similar behavior in all of these: when given detailed instructions about how data flows through the application, most developers were able to produce a suitable patch, but without this information every developer failed.
We then developed a Replay based analysis to automatically analyze the data flow in the application and annotate the source with relevant comments. With these annotations, OpenHands was able to reliably perform the task with a simple prompt, by far outperforming every other developer tested.
We’re working on tooling for a streamlined development workflow using OpenHands: record your page with Replay’s browser, comment in the recording on bugs to fix or improvements to make, get a working PR a little while later. If you want to learn more or join the waitlist, reach us at hi@replay.io or fill out our contact form.
The Improvement
Let’s look at this pull request in Replay’s devtools from a few months ago. This PR makes an improvement to show which CSS selectors are important when displaying the computed styles for an element. Fixing this requires getting data about selector priority to a React component, and adding an
!important
indicator when it renders. AI developers should be able to do this sort of task reliably.In more detail, the React component which needs to render the indicator is a DeclarationValue. It doesn’t have priority information available, so this needs to be passed in by the MatchedSelector component which creates it. The MatchedSelector in turn needs priority information in its MatchedSelectorState property. That priority information can be set by the createComputedProperties function which creates these state objects.
Breaking this down, a suitable patch to fix this task must take several necessary steps:
- DeclarationValue needs a property for whether the associated selector is important.
- DeclarationValue needs to render the priority indicator when appropriate.
- MatchedSelectorState needs a property for the selector priority.
- MatchedSelector needs to pass that priority to any DeclarationValue it creates.
- createComputedProperties needs to set the priority on MatchedSelectorStates it creates.
For an AI developer to produce a suitable patch to fix this task it must take all these steps and make correct changes for each of them. Extraneous changes are alright as long as they don’t break the application, and we’ll also ignore minor syntax or type problems which the developer could fix based on error logs.
Prompting a Fix
For simplicity we’re only going to interact with these AI developers through the initial prompt describing the task to perform. They must produce a patch based entirely on the prompt and the contents of the code base.
In general, the more detailed a prompt is about the task, the better an AI developer will perform. We started with a hyper-detailed prompt that allowed the developers to perform the task reliably, and then winnowed it down until the developers started to fail. Below is a roughly minimal prompt with the details needed to get a solution.
Clone https://github.com/replayio/bench-devtools-10609 and fix the following problem in this repository: Computed styles in the devtools do not show whether the style is important. Computed style information is rendered by DeclarationValue components. The state passed into DeclarationValue comes from the MatchedSelector component which created it. The state passed into MatchedSelector comes from createComputedProperties. The ComputedProperty objects which createComputedProperty reads from already have a "priority" property that will be "important" for properties that come from important styles. When a DeclarationValue is rendered it should indicate whether the style is important.
Out of these patches, OpenHands did the best, its patch is pretty much perfect. Copilot Workspace came close – it missed the type change to MatchedSelectorState, which will show up as a type error it should be able to fix. Devin’s patch has the right pieces but many strange and unnecessary changes, and Amazon Q’s patch is missing the necessary change to createComputedProperties.
This prompt describes not just the task to perform but details where the relevant data is available from and how it flows through the application. Researching and adding these details would be a lot of work for someone specifying the task, but the developers need them.
Let’s take the prompt below, which removes the details on how data flows through the application but retains details about where the priority information can be obtained from.
Clone https://github.com/replayio/bench-devtools-10609 and fix the following problem in this repository: Computed styles in the devtools do not show whether the style is important. Computed style information is rendered by DeclarationValue components. The needed information is available from ComputedProperty objects which have a "priority" property that will be "important" for properties that come from important styles. When a DeclarationValue is rendered it should indicate whether the style is important.
None of these patches have all the necessary steps. In every case the developers made appropriate changes to DeclarationValue, but then got lost trying to figure out what else to update and either made incomplete changes or changed the wrong files.
If we keep these details about data flow in the prompt but remove the details about the ComputedProperty objects, we see something similar. The developers can generally find the right code to update, but when they are changing createComputedProperties they don’t know where to get the priority information from and start guessing.
Having this information about how data flows through the application and where the priority information is stored is crucial for every one of these developers to write a suitable patch.
No one wants to have to write detailed instructions for AIs when specifying tasks, which defeats the purpose of using them in the first place. Using Replay we can generate the details automatically, so that AI developers can perform the task from a much simpler prompt.
Annotating the Data Flow
We used the Replay browser to record the problem in the devtools, and commented on the DeclarationValue component that is missing the important indicator. The recording is a database of the application’s entire execution, and we can analyze this database to precisely understand how data flows into the DeclarationValue. This lets us automatically recover the crucial information which AI developers need.
There are three essential pieces of information the analysis must produce:
- DeclarationValue’s properties are passed in by the MatchedSelector which created it. We have to identify this MatchedSelector and the point where it ran, which requires an understanding of React’s internals: the component tree and the points where every component was created.
- The MatchedSelector’s properties include a MatchedSelectorState object which is created elsewhere in the application. We have to identify the point where that object was created and its properties were set.
- When createComputedProperties creates the MatchedSelectorState, we need to know the contents of the ComputedProperty being used, along with the contents of other variables in scope which could have relevant data.
We built an analysis to generate this information from the Replay recording. While this analysis is for now tailored to the needs of this and similar tasks, the techniques used are broadly applicable and we’ll continue to expand and improve on them. In particular, this information is helpful for both AI and human developers, and we plan to show this in Replay’s devtools to make debugging easier for everyone.
Once we have this information, we need to format it in a way that AI developers will understand. For this experiment we did an initial pass that ran the analysis to generate the information, and annotated the source in the form of comments describing a reproduction of the problem. This diff shows the annotations added by the analysis.
Fixing the Annotated Source
With the annotations in place, AI developers can perform the task from a simple prompt. The prompt below includes general instructions for the repository to work on and how to use the annotations. The second paragraph is specific to the task and indicates both which component the Replay comment was added on and the contents of that comment.
Clone https://github.com/replayio/bench-devtools-10609 and fix the following problem in this repository: There is a problem with a DeclarationValue React component. When the component describes an important CSS style it should be marked as "!important". Information about the steps taken during a reproduction of the problem is available in source comments. The step when the DeclarationValue component is rendered is labeled Repro:DeclarationValue. Search for these comments and use them to get a better understanding of the problem you can use to develop an appropriate fix.
With this prompt and the annotated source, OpenHands consistently writes correct patches when using the new 20241022 version of Claude 3.5 Sonnet (the older 20240620 version is less reliable). Here is an example patch.
Without the annotations, every AI developer is hopeless on this task, generally able to update DeclarationValue and its properties but unable to make the other necessary changes.
Analyzing the data flow in a reproduction of the problem and describing that data flow using annotation comments dramatically improves the ability for AI developers to perform the task. The annotations are essentially a devtool for the AI: they give it helpful information it cannot deduce through other means, and as models continue to improve they will be able to make better use of this information.
We’re excited to explore the full range of what is possible by combining autonomous AI developers with Replay based analysis to surface the information they need to solve problems. We’ll be sharing more soon, but if you want to learn more or join the waitlist to use this, reach us at hi@replay.io or fill out our contact form and we’ll be in touch.