Comment by dmitrygr

7 months ago

My (unreleased) cat did even better than the OpenAI model. No you cannot see. Yes you have to trust me. Now gimme more money.

3 comments

dmitrygr

klabb3 7 months ago

Wow, that’s incredible. Cats are progressing so fast, especially unreleased cats seem to be doing much better. My two orange kitties aren’t doing well on math problems but obviously that’s because I’m not prompting the right way – any day now. If I ever get it to work, I’ll be sure to share the achievements on X, while carefully avoiding explaining how I did it or provide any data that can corroborate the claims.

raincole 7 months ago

I don't know the details (of course, it's unreleased), but note that MathArena evaluated "average of 4 attempts", and limited token usages to 64k.

OpenAI likely had unlimited tokens, and evaluated "best of N attempts."

amelius 7 months ago

That's a claim that is far less plausible. OpenAI could have thrown more resources at the problem and I would be surprised if that didn't improve the results.