Comment by iamflimflam1
17 hours ago
Definitely. A lot of what is missing in many discussions is the absolutely essential need to have evals.
The only way to “know” what is the best (or better) approach is to have a significant number of test cases that you can measure performance against.
At the moment, for a lot of people, state of the art is “let’s try a different prompt and see if the answer on my one example is better”
No comments yet
Contribute on Hacker News ↗