Comment by anonymoushn
5 hours ago
The framing in this post is really weird. Automated evals can be much more informative than unit tests because the results can be much more fine grained. A/B testing in production is not suitable for determining whether all of one's internal experiments are successful or not.
I don't doubt that Raindrop's product is worthwhile to model vendors, but the post seems like its audience is C suite folks who have no clue how anything works. Do their most important customers even have any of these?
I think in most cases, outside of pure AI providers or think AI wrappers, almost every team will realize more gains from focusing on their user domains and solving business problems versus fine tuning their prompts to eek out a 5% improvement here and there.
I don’t think you can use this as a blanket statement. For many use cases the last 5-10% is the difference between demoware and production.