Friends at Meta with access to the model + personal experience at Meta.
Meta's performance process is essentially "show good numbers or you're out." So guess what people do when they don't have good numbers? They fudge them. Happens all across the company.
Re: changes, there's been enormous turnover in AI organizations, and in theory this one was developed by a "new" org. Whether that means less or more benchmaxxing is anyone's guess.
More I'd guess since the new org needs to prove itself long enough for stock to vest. Fudge the benchmarks gives them a longer horizon before they're all fired anyways.
Friends at Meta with access to the model + personal experience at Meta.
Meta's performance process is essentially "show good numbers or you're out." So guess what people do when they don't have good numbers? They fudge them. Happens all across the company.
For one, they aren't using the latest version of many of the benchmarks. eg, ARC-AGI 2 and not 3, etc.
meta's benchmaxing tendencies are well known. llama4 was mega benchmaxxed, there's nothing that suggests to me that meta's culture has changed.
Re: changes, there's been enormous turnover in AI organizations, and in theory this one was developed by a "new" org. Whether that means less or more benchmaxxing is anyone's guess.
More I'd guess since the new org needs to prove itself long enough for stock to vest. Fudge the benchmarks gives them a longer horizon before they're all fired anyways.