Comment by beefnugs
15 days ago
These kinds of tests to me are not complete until they resolve the concept to full solution:
-Start just as they have here
-Keep improving the prompts in a huge variety of ways to see what improvements can be made
-start getting more and more code generated to complete more and more percentage of the work instead of textual prompting
-start fixing the worst parts with real human knowledge code/tools
-finally show fully working solution that does well, with full analysis of what kind of human intervention was necessary, and even explore what kind of prompting could lead to these human intuition-ed tooling going to whatever incredible lengths necessary to hand-hold the models in the right direction
otherwise... i don't get the points of stopping and saying "doesn't do great"
No comments yet
Contribute on Hacker News ↗