Comment by bgwalter
5 months ago
It is not historical:
https://kieranhealy.org/blog/archives/2025/08/07/blueberry-h...
Perhaps they have a hot fix that special cases HN complaints?
5 months ago
It is not historical:
https://kieranhealy.org/blog/archives/2025/08/07/blueberry-h...
Perhaps they have a hot fix that special cases HN complaints?
They clearly RLHF out the embarrassing cases and make cheating on benchmarks into a sport.
I wouldn't be surprised if some models get set up to identify that type of question and run the word through string processing function.
Of course they do stuff like that, otherwise it would look like they are stagnating. Fake it till you make it. Tho, at this point, the world is in deep shit, if they don’t make it…
2 replies →