Show HN: Autocrit – an agent loop that builds and tests web prototypes

5 hours ago (github.com)

Hi HN, this is my first Show HN submission.

I've been thinking about how product development will change with AI. The earliest stages of product development have so much ambiguity. Because code was costly and expensive, we spent a lot of time writing specs and doing user research.

I thought I'd try an experiment after (a) seeing advancements around evaluation systems especially for UX (b) realizing that AI can create a reasonably good enrichment of a persona/end-user (c) seeing karpathy's autoresearch project.

Autocrit is a pi extension. Start the pi harness and run the autocrit skill. It will ask you for a high-level app idea, and a definition of the target user. Autocrit will create a persona definition, has that persona create evaluation tasks, and then starts a loop where a coding agent builds a prototype, and a persona agent will try to use it in a real browser. They will judge it based on the tasks, giving scores and verbatim feedback. The coding agent creates a plan to fix things, keeps improvements, reverts bad ideas, etc. The loop runs overnight.

The goal is to get a better understanding of where to take the product at an early stage e.g. paper prototypes, before actually starting to build the product. The evaluation loop of prototyping and getting feedback is automated here, but humans provide the definition of the persona, app idea / product goals, and hypotheses that need validation.

2 comments

4di

jsonfitzface 3 hours ago

The ambiguity doesn't go away with specs, it just gets deferred to when the code is written and you discover nobody wanted the thing you specified. If AI can shrink the loop between 'I think this is a problem' and 'someone who has this problem told me it isn't' that's genuinely useful, that's the part that kills most early products not the building itself.

4di 3 hours ago

Well said.. yes, it’s more about validating hypotheses about what you’re trying to build.
My own hypothesis is that we could create a rough representation of your end user and create an agent loop around that to try to validate. It won’t work for all stages of product development where you need humans to give you feedback but likely useful in the beginning stages