← Back to context

Comment by iamflimflam1

12 hours ago

Definitely. A lot of what is missing in many discussions is the absolutely essential need to have evals.

The only way to “know” what is the best (or better) approach is to have a significant number of test cases that you can measure performance against.

At the moment, for a lot of people, state of the art is “let’s try a different prompt and see if the answer on my one example is better”