Comment by tty456

7 hours ago

I don't get the comments trashing this. If it slightly beats or even matches Opus 4.6, it means Meta is capable of building a model competitive with the leading AI company. Sure, they spent a lot of money and will have on-going costs. But how much more work would it take to turn that into a coding agent people are willing to try (and pay for) along side their usage of a collection of agents (Claude, Codex, etc)? Also means Meta doesn't have to pay another company to use a SATA model across all their products (including IG and WhatsApp, vr) which will matter to their balance sheet long term (despite the constant r&d spend).

23 comments

tty456

prodigycorp 6 hours ago

Comments trashing this are rightly correct skeptics who remember the benchmaxxing of llama 4. This model was out in the woods as early as like a couple months ago but they didn't release it because it was at gemini 2.5 pro levels.

zozbot234 6 hours ago
The llama4 series was one of the earliest large MoE's to be made publically available. People just ignored it because they were focused on running smaller and denser models at the time, we should know better these days.
- dilap 6 hours ago
  
  Deepseek R1 was a publically-available, MoE model that was getting a ton of attention before llama4. Llama4 didn't get much attention because it wasn't good.
  
  1 reply →
- prodigycorp 6 hours ago
  
  the models were objectively horrible
  
  7 replies →

canes123456 16 minutes ago

Why go into coding agents? Both anthropic and OpenAI are going all in on that. The opportunity is customer facing AI now.

OpenAI has the mindshare but they going to have to decide if they allocate their limited compute for free users or go all in trying to keep up with Anthropic in enterprise.

modeless 4 hours ago

It's a decent model if the benchmarks are to be believed, but it won't be close to Opus in usefulness for programming. None of these benchmarks completely capture what makes a model useful for day-to-day coding tasks, unfortunately. It will take time for them to catch up, and Opus will keep improving in the meantime. But it's good to have more competition.

ai5iq 1 few seconds ago

Benchmarks miss the thing that actually matters for agentic use: how does behavior change over a multi-day horizon? A model that scores well on one-shot coding tasks can still make terrible decisions when it has persistent state and resource constraints. That's where you see the real gaps between models.

ChipopLeMoral 6 hours ago

> I don't get the comments trashing this.

People like to hate on Meta regardless of anything, and regardless of whether it's justified or not. Not saying it isn't, just that it's many people's default bias.

redox99 6 hours ago

> If it slightly beats or even matches Opus 4.6

It doesn't though

ryeguy_24 6 hours ago
Curious on why you think this. Any data points that led you to this?
- howdareme 6 hours ago
  
  The benchmarks they released
  
  2 replies →

blazespin 2 hours ago

Because bots and trillion dollar ipos and even bigger stakes. People need to better appreciate the level of manipulation going on. Social media has an outsized impact. Bots and even people are getting paid to post and upvote/downvote narratives.

asdfman123 2 hours ago

> people are getting paid to post and upvote/downvote narratives
This problem will be solved shortly with better AI (if it hasn't essentially been solved already).
No more humans in the loop, much lower costs for social media manipulation. Welcome to the future!