Comment by sourcecodeplz

1 day ago

Haven't seen a jump this large since I don't even know, years? Too bad they are not releasing it anytime soon (there is no need as they are still currently the leader).

82 comments

sourcecodeplz

ru552 1 day ago

There's speculation that next Tuesday will be a big day for OpenAI and possibly GPT 6. Anthropic showed their hand today.

varispeed 1 day ago
Sounds like a good opportunity to pause spending on nerfed 4.6 and wait for the new model to be released and then max out over 2 weeks before it gets nerfed again.
- SparkyMcUnicorn 1 day ago
  
  https://marginlab.ai/trackers/claude-code-historical-perform...
  
  7 replies →
enraged_camel 1 day ago
That does not sound very believable. Last time Anthropic released a flagship model, it was followed by GPT Codex literally that afternoon.
- cyanydeez 1 day ago
  
  Ya'll know they're teaching to the test. I'll wait till someone devises a novel test that isn't contained in the datasets. Sure, they're still powerful.
swalsh 1 day ago
My understanding is GPT 6 works via synaptic space reasoning... which I find terrifying. I hope if true, OpenAI does some safety testing on that, beyond what they normally do.
- tyre 1 day ago
  
  From the recent New Yorker piece on Sam:
  “My vibes don’t match a lot of the traditional A.I.-safety stuff,” Altman said. He insisted that he continued to prioritize these matters, but when pressed for specifics he was vague: “We still will run safety projects, or at least safety-adjacent projects.” When we asked to interview researchers at the company who were working on existential safety—the kinds of issues that could mean, as Altman once put it, “lights-out for all of us”—an OpenAI representative seemed confused. “What do you mean by ‘existential safety’?” he replied. “That’s not, like, a thing.”
  
  5 replies →
- coppsilgold 1 day ago
  
  Likely an improvement on:
  > We study a novel language model architecture that is capable of scaling test-time computation by implicitly reasoning in latent space. Our model works by iterating a recurrent block, thereby unrolling to arbitrary depth at test-time. This stands in contrast to mainstream reasoning models that scale up compute by producing more tokens. Unlike approaches based on chain-of-thought, our approach does not require any specialized training data, can work with small context windows, and can capture types of reasoning that are not easily represented in words. We scale a proof-of-concept model to 3.5 billion parameters and 800 billion tokens. We show that the resulting model can improve its performance on reasoning benchmarks, sometimes dramatically, up to a computation load equivalent to 50 billion parameters.
  <https://arxiv.org/abs/2502.05171>
- levocardia 1 day ago
  
  Oh you mean literally the thing in AI2027 that gets everyone killed? Wonderful.
  
  4 replies →
- notrealyme123 1 day ago
  
  That's sounds really interesting. Do you have some hints where to read more?
- arm32 1 day ago
  
  Oh, of course they will /s

lumost 1 day ago

Is this even real? coming off the heals of GLM5.1's announcement this feels almost like a llama 4 launch to hedge off competition.

m3kw9 7 hours ago

not much of a jump 94.5% / 91.3%

kkoncevicius 4 hours ago
We can look at the same numbers in different way:
Error with 91.3% = 8.7% Error with 94.5% = 5.5% Error reduction = 8.7% - 5.5% = 3.2%
So the improvement is 3.2% / 8.7% = 36.8%
enraged_camel 7 hours ago

Actually, going from 91.3% to 94.5% is a significant jump, because it means the model has gotten a lot better at solving the hardest problems thrown at it. This has downstream effects as well: it means that during long implementation tasks, instead of getting stuck at the most challenging parts and stopping (or going in loops!), it can now get past them to finish the implementation.

Jcampuzano2 1 day ago

A jump that we will never be able to use since we're not part of the seemingly minimum 100 billion dollar company club as requirement to be allowed to use it.

I get the security aspect, but if we've hit that point any reasonably sophisticated model past this point will be able to do the damage they claim it can do. They might as well be telling us they're closing up shop for consumer models.

They should just say they'll never release a model of this caliber to the public at this point and say out loud we'll only get gimped versions.

cedws 1 day ago
More than killer AI I'm afraid of Anthropic/OpenAI going into full rent-seeking mode so that everyone working in tech is forced to fork out loads of money just to stay competitive on the market. These companies can also choose to give exclusive access to hand picked individuals and cut everyone else off and there would be nothing to stop them.
This is already happening to some degree, GPT 5.3 Codex's security capabilities were given exclusively to those who were approved for a "Trusted Access" programme.
- TypesWillSaveUs 1 day ago
  
  Describing providing a highly valuable service for money as `rent seeking` is pretty wild.
  
  12 replies →
- aspenmartin 1 day ago
  
  Well don’t forget we still have competition. Were anthropic to rent seek OpenAI would undercut them. Were OpenAI and anthropic to collude that would be illegal. For anthropic to capture the entire coding agent market and THEN rent seek, these days it’s never been easier to raise $1B and start a competing lab
  
  9 replies →
- alwillis 14 hours ago
  
  > More than killer AI I'm afraid of Anthropic/OpenAI going into full rent-seeking mode so that everyone working in tech is forced to fork out loads of money just to stay competitive on the market.
  You should be more concerned about killer AI than rent seeking by OpenAI and Anthropic. AI evolving to the point of losing control is what scientists and researchers have predicted for years; they didn’t think it would happen this quickly but here we are.
  This market is hyper competitive; the models from China and other labs are just a level or two below the frontier labs.
- therealdeal2020 1 day ago
  
  but you are assuming that the magical wizards are the only ones who can create powerful AIs... mind you these people have been born just few decades ago. Their knowledge will be transferred and it will only take a few more decades until anyone can train powerful AIs ... you can only sit on tech for so long before everyone knows how to do it
  
  7 replies →
- robwwilliams 1 day ago
  
  With Gemma-4 open and running on laptops and phones I see the flip side. How many non-HN users or researchers even need Opus 4.6e level performance? OpenAI, Anthropric and Google may be “rent seeking” from large corporations — like the Oracles and IBMs.
  
  1 reply →
- eru 18 hours ago
  
  You know, they have competitors?
- MattRix 1 day ago
  
  The thing is that the current models can ALREADY replicate most software-based products and services on the market. The open source models are not far behind. At a certain point I'm not sure it matters if the frontier models can do faster and better. I see how they're useful for really complex and cutting edge use cases, but that's not what most people are using them for.
ben_w 14 hours ago

> I get the security aspect, but if we've hit that point any reasonably sophisticated model past this point will be able to do the damage they claim it can do. They might as well be telling us they're closing up shop for consumer models.
I read it like I always read the GPT-2 announcement no matter what others say: It's *not* being called "too dangerous to ever release", but rather "we need to be mindful, knowing perfectly well that other AI companies can replicate this imminently".
The important corps (so presumably including the Linux Foundation, bigger banks and power stations, and quite possibly excluding x.com) will get access now, and some other LLM which is just as capable will give it to everyone in 3 months time at which point there's no benefit to Anthropic keeping it off-limits.
marcus_holmes 21 hours ago

This is my nightmare about AI; not that the machines will kill all the humans, but that access is preferentially granted to the powerful and it's used to maintain the current power structure in blatant disregard of our democratic and meritocratic ideals, probably using "security" as the justification (as usual).
alwillis 14 hours ago

> They should just say they'll never release a model of this caliber to the public at this point and say out loud we'll only get gimped versions.
That’s not going to happen. If you recall, OpenAI didn’t release a model a few years ago because they felt it was too dangerous.
Anthropic is giving the industry a heads up and time to patch their software.
They said there are exploitable vulnerabilities in every major operating system.
But in 6 months every frontier model will be able to do the same things. So Anthropic doesn’t have the luxury of not shipping their best models. But they also have to be responsible as well.
quotemstr 1 day ago
This is why the EAs, and their almost comic-book-villain projects like "control AI dot com" cannot be allowed to win. One private company gatekeeping access to revolutionary technology is riskier than any consequence of the technology itself.
- scrawl 1 day ago
  
  Having done a quick search of "control AI dot com", it seems their intent is educate lawmakers & government in order to aid development of a strong regulatory framework around frontier AI development.
  Not sure how this is consistent with "One private company gatekeeping access to revolutionary technology"?
  
  2 replies →
- frozenseven 1 day ago
  
  Couldn't agree more. The "safest" AI company is actually the biggest liability. I hope other companies make a move soon.
- FeepingCreature 1 day ago
  
  No it isn't lol. The consequence of the technology literally includes human extinction. I prefer 0 companies, but I'll take 1 over 5.
mike_hearn 8 hours ago
I think they already said somewhere that they can't release Mythos because it requires absurdly large amounts of compute. The economics of releasing it just don't work.
- yencabulator 1 hour ago
  
  Yet they quote a $20,000 cost for one of the exploits.
guzfip 1 day ago

> A jump that we will never be able to use since we're not part of the seemingly minimum 100 billion dollar company club as requirement to be allowed to use it.
> They should just say they'll never release a model of this caliber to the public at this point and say out loud we'll only get gimped
Duh, this was fucking obvious from the start. The only people saying otherwise were zealots who needed a quick line to dismiss legitimate concerns.