← Back to context

Comment by aleksiy123

6 days ago

It’s also just a useful exercise in general, especially for getting feedback for models and harnesses.

I’ve been thinking about setting up a non trivial project to use as a benchmark for any plugins and/or harness changes I make.

Having a prebuilt verification suite is great. You can use it to asses things like token usage, time, across different harnesses, models, plugins.