Comment by refulgentis

1 month ago

"...as even well tempered discussion about the rest would be against the guidelines anyways."

Didn't bother reading after that. I deeply respect you have the self-awareness to notice and spare us, that's rare. But it also means we all have to have conversations purely on your terms, and because its async, the rules constantly change post-hoc.

And that's on top of the post-hoc motte / bailey instances, of which we have multiple. I was stunned (stunned!!) by the attempted retcon of the app claim once there were numbers.

Anyways, all your bete noirs aside, all your Red Team vs. Blue Team signalling aside, using LMArena alone as a benchmark is a bad idea.

The conversation is certainly not on "my terms" as I didn't write the guidelines (nor do they benefit me more than anyone else). If you are genuinely concerned with the conversation, please flag it and/or email hn@ycombinator.com and they will (genuinely) handle it appropriately. Otherwise there is not much else which can be said around this here.

If not, continuing to have a conversation can only happen if we want to discuss the recent growth rate of AI and take the time to read what each other write. Similarly, async conversation can be as clear and consistent as we want it to be - we just have to take the time to ask for clarification before writing a response on something we feel could be a movable understanding. Nothing is meant to be unclear as a "gotcha" and I'll always be glad to clarify before moving on.

I also agree nobody should rely solely on LM Arena for benchmarks, which is not what starting a conversation by using it in an example was meant to imply we need to do. I'd love to continue chatting more about other benchmarks and how you see Tao's comments, as you seem to have walked away from reading them with a very different understanding than I did.