Comment by ponyous
7 days ago
I am building pretty much the same product as OP, and have a pretty good harness to test LLMs. In fact I have run a tons of tests already. It’s currently aimed for my own internal tests, but making something that is easier to digest should be a breeze. If you are curious: https://grandpacad.com/evals
No comments yet
Contribute on Hacker News ↗