Comment by theahura

2 months ago

I was able to get Claude to do this, though it kinda sorta cheated. Blog post describing the output here: https://theahura.substack.com/p/i-successfully-recreated-the...

TLDR:

"The plan is designed to ‘autoformalize’ the problem by using Test Driven Development (TDD). TDD is incredibly important for getting good outputs from a coding agent, because it helps solve the context rot problem. Specifically, if you can write a good test when the model is most ‘lucid’, it will have an easier time later on because it is just solving the test instead of ‘building a feature’ or whatever high dimensional ask you originally gave it.

From here, Nori chugged away for the better part of half an hour in yolo mode while I went to do other things. And eventually I got a little pop up notification saying that it was done. It had written a playwright test that would open an html file, screenshot it, diff it with the original screenshot, and output the final result...

After trying a few ways to get the stars to line up perfectly, it just gave up and copied the screenshot in as the background image, then overlaid the rest of the HTML elements on top.

I’m tempted to give this a pass for a few reasons.

This obviously covers the original use case that tripped up Jonah.

It also is basically exactly what I asked the model to do — that is, give me a pixel perfect representation — so it’s kind of my fault that I was not clearer.

I’m not sure the model actually can get to pixel perfect any other way. The screengrab has artifacts. After all, I basically just used the default linux screenshot selection tool to get the original output, without even paying much attention to the width of the image.

If you ask the model to loosen the requirements for the exact screengrab, it does the right thing, but the pixel alignment is slightly off. The model included this as index_tiled.html in the repo, and you can see the pixel diff in one of the output images..."

0 comments

theahura

No comments yet

Contribute on Hacker News ↗