Comment by smlacy

5 hours ago

Post actual results, make a blog post. Don't just say "this sucks" without tangible evidence.

Otherwise you're doomed to "sample size of one" level of relevance.

I have the opposite experience: random HN/Reddit comments saying “this sucks” or “whoa this is a huge improvement” are the only benchmark that means anything. Standard benchmarks are all gamed and don’t capture the complexity of the real world.

Then your internal benchmarks will be in the post-training set and you’ll have to make new ones.