Comment by planb

8 months ago

And by a sample that has become increasingly known as a benchmark. Newer training data will contain more articles like this one, which naturally improves the capabilities of an LLM to estimate what’s considered a good „pelican on a bike“.

4 comments

planb

criddell 8 months ago

And that’s why he says he’s going to have to find a new benchmark.

viraptor 8 months ago

Would it though? There really aren't that many valid answers to that question online. When this is talked about, we get more broken samples than reasonable ones. I feel like any talk about this actually sabotages future training a bit.

I actually don't think I've seen a single correct svg drawing for that prompt.

cyanydeez 8 months ago

So what you really need to do is clone this blog post, find and replace pelican with any other noun, run all the tests, and publish that.

Call it wikipediaslop.org

YuccaGloriosa 8 months ago

If the any other noun becomes fish... I think I disagree.