Comment by hedgehog
1 month ago
This looks pretty solid. I think you can make this process more efficient by decomposing the problem into layers that are more easily testable, e.g. testing topological relationships of DOM elements after parse, then spatial after layout, then eventually pixels on things like ACID2 or whatever the modern equivalent is. The models can often come up with tests more accurately than they get the code right the first time. There are often also invariants that can be used to identify bugs without ground truth, e.g rendering the page with slightly different widths you can make some assertions about how far elements will move.
> There are often also invariants that can be used to identify bugs without ground truth, e.g rendering the page with slightly different widths you can make some assertions about how far elements will move.
That's really interesting and sounds useful! I'm wondering if there are general guidelines/requirements (not specific to browsers) that could kind of "trigger" those things in the agent, without explicitly telling it. I think generally that's how I try to approach prompting.
I think if you explain that general idea the models can figure it enough to write into an implementation plan, at least some of the time. Interesting problem though.
> that general idea the models can figure it enough to write into an implementation plan
I'm not having much luck with it, they get lost in their own designs/architectures all the time, even the best models (as far as I've tested stuff). But as long as I drive the design, things don't end up in a ball of spaghetti immediately.
Still trying to figure out better ways of doing that, feels like we need to focus on tooling that lets us collaborate with LLMs better, rather than trying to replace things with LLMs.
1 reply →
the modern equivalent is the Web Platform Tests
https://web-platform-tests.org/
Amazing. I think if I were taking on the build-a-browser project I would pair that with the WhatWG HTML spec to come up with a task list (based on the spec line-by-line) linked to specific tests associated with each task. Then of course need an overall architecture and behavioral spec for how the browser part behaves beyond just rendering. A developer steering process full time might be able to get within 80% parity of existing browsers in a month. It would be an interesting experiment.
> I would pair that with the WhatWG HTML spec
I placed some specifications + WPT into the repository the agent had access to! https://github.com/embedding-shapes/one-agent-one-browser/tr...
But judging by the session logs, it doesn't seem like the agent saw them, I never pointed it there, and seems none of the searches returned anything from there.
I'm slightly curious in doing it from scratch again, but this time explicitly point it to the specifications, and see if it gets better or worse.