Comment by anonymoushn

3 months ago

The framing in this post is really weird. Automated evals can be much more informative than unit tests because the results can be much more fine grained. A/B testing in production is not suitable for determining whether all of one's internal experiments are successful or not.

I don't doubt that Raindrop's product is worthwhile to model vendors, but the post seems like its audience is C suite folks who have no clue how anything works. Do their most important customers even have any of these?

3 comments

anonymoushn

CharlieDigital 3 months ago

I think in most cases, outside of pure AI providers or think AI wrappers, almost every team will realize more gains from focusing on their user domains and solving business problems versus fine tuning their prompts to eek out a 5% improvement here and there.

basket_horse 3 months ago
I don’t think you can use this as a blanket statement. For many use cases the last 5-10% is the difference between demoware and production.
- CharlieDigital 3 months ago
  
  If that were true, just switching to TOON would make your startup take off.
  That is obviously not true because a 5% gain in LLM performance isn't going to make up for a bad product.