Comment by solenoid0937

7 hours ago

It's benchmaxxed.

If they actually matched Opus 4.6 on such a short timeline, it would have been mighty impressive. (Keep in mind this is a new lab and they are prohibited from doing distills.)

6 comments

solenoid0937

throwaw12 7 hours ago

how do you know it's benchmaxxed?

solenoid0937 7 hours ago

Friends at Meta with access to the model + personal experience at Meta.
Meta's performance process is essentially "show good numbers or you're out." So guess what people do when they don't have good numbers? They fudge them. Happens all across the company.
luma 6 hours ago

For one, they aren't using the latest version of many of the benchmarks. eg, ARC-AGI 2 and not 3, etc.
prodigycorp 7 hours ago
meta's benchmaxing tendencies are well known. llama4 was mega benchmaxxed, there's nothing that suggests to me that meta's culture has changed.
- spindump8930 5 hours ago
  
  Re: changes, there's been enormous turnover in AI organizations, and in theory this one was developed by a "new" org. Whether that means less or more benchmaxxing is anyone's guess.
  
  1 reply →