Comment by jasonjmcghee

4 months ago

Curious if anyone else had the same reaction as me

This model is specifically trained on this task and significantly[1] underperforms opus.

Opus costs about 6x more.

Which seems... totally worth it based on the task at hand.

[1]: based on the total spread of tested models

26 comments

jasonjmcghee

beernet 4 months ago

Agreed. The idea is nice and honorable. At the same time, if AI has been proving one thing, it's that quality usually reigns over control and trust (except for some sensitive sectors and applications). Of course it's less capital-intense, so makes sense for a comparably little EU startup to focus on that niche. Likely won't spin the top line needle much, though, for the reasons stated.

isodev 4 months ago
> quality usually reigns over control and trust
Most Copilot customers use Copilot because Microsoft has been able to pinky promise some level of control for their sensitive data. That's why many don't get to use Claude or Codex or Mistral directly at work and instead are forced through their lobotomised Copilot flavours.
Remember, as of yet, companies haven't been able to actually measure the value of LLMs ... so it's all in the hands of Legal to choose which models you can use based on marketing and big words.
- Eridrus 4 months ago
  
  This too will be solved. You can get tye frontier models from AWS/Google/Azure without needing to send your data to anyone else already.
hermanzegerman 4 months ago
EU could help them very much if they would start enforcing the Laws, so that no US Company can process European data, due to the Americans not willing to budge on Cloud Act.
That would also help to reduce our dependency on American Hyperscalers, which is much needed given how untrustworthy the US is right now. (And also hostile towards Europe as their new security strategy lays out)
- bcye 4 months ago
  
  This would be unfortunately a rather nuclear option due to the continent’s insane reliance on technology that breaks its unenforced laws.
  
  4 replies →
segmondy 4 months ago
Ha, keep putting your prompts and workflows into cloud models. They are not okay with being a platform, they intend to cannibalize all businesses. Quality doesn't always reign over control and trust. Your data and original ideas are your edge and moat.
- KetoManx64 4 months ago
  
  The same old speech that has been used throughout history. When cars were invented people complained to everyone that Ford intended to cannbolize all horse drawn carriages. When manufacturing was invented it cannibalized the work of all the sewing and knitting companies that had women making one item at a time. When Google was invented it cannabolized libraries, and encyclopedias, etc. etc. Yet nobody wants a horse drawn carriage, nor to knit their own sweaters, nor go to the library to look things up in a physical encyclopedia.
miohtama 4 months ago

Alignment tax directly eats to model quality, double digit percents.
hrmtst93837 4 months ago

[flagged]
hrmtst93837 4 months ago

[flagged]
hrmtst93837 4 months ago

[flagged]
hrmtst93837 4 months ago

[flagged]

DarkNova6 4 months ago

I'm never sure how much faith one can put into such benchmarks but in any case the optics seem to shift once you have pass@2 and pass@3.

Still, the more interesting comparison would be against something such as Codex.

speedgoose 4 months ago

But you can run this model for free on a common battery powered laptop sitting on your laps without cooking your legs.

hobofan 4 months ago
Sorry, but what are you talking about? This is a 120B-A6B model, which isn't runnable on any laptop except the most beefed up Macbooks, and then will certainly drain its battery and cook your legs.
- naasking 4 months ago
  
  You can easily run a quant of this on a DGX Spark though. Seems like a small investment if it meaningful improves Lean productivity.
  
  3 replies →
- speedgoose 4 months ago
  
  Yeah my bad, it requires an expensive MacBook.
  I think it would still be fine for the legs and on battery for relatively short loads: https://www.notebookcheck.net/Apple-MacBook-Pro-M5-2025-revi...
  But 40 degrees and 30W of heat is a bit more than comfortable if you run the agent continuously.

nimchimpsky 4 months ago

the model is open source, you can run it locally. You don't think thats significant ?