Comment by o10449366

14 hours ago

[flagged]

8 comments

o10449366

"Quirky and obscure" has the functional benefit of ensuring the source question is not in the training data/outside the median user prompt, and therefore making the model less likely to cheat.

We have enough people complaining about Simon Willison's pelican test.

o10449366 8 hours ago

When you program, do you consider using your prior knowledge of programming cheating?

Bjartr 12 hours ago

What would make the prompt a better actual evaluation in your judgement?

leptons 9 hours ago

Not focusing on pokemon for a start. Maybe use something more people can recognize and evaluate. I have zero knowledge of pokemon, I see it as a niche thing for ultra-nerdy people, and not something everyone is familiar with. Nothing about that test can be evaluated by anyone but a pokemon expert. Sorry, but pokemon isn't as mainstream as some people might think it is.

tailscaler2026 13 hours ago

still #opentowork huh

beepbooptheory 11 hours ago
Where does one even use that hashtag?
- minimaxir 10 hours ago
  
  It's a LinkedIn joke.

codemog 13 hours ago

Ah yes, also known as C++ enjoyers.