Comment by 2001zhaozhao

1 day ago

It's pretty crazy watching AI 2027 slowly but surely come true. What a world we now live in.

SWE-bench verified going from 80%-93% in particular sounds extremely significant given that the benchmark was previously considered pretty saturated and stayed in the 70-80% range for several generations. There must have been some insane breakthrough here akin to the jump from non-reasoning to reasoning models.

Regarding the cyberattack capabilities, I think Anthropic might now need to ban even advanced defensive cybersecurity use for the models for the public before releasing it (so people can't trick them to attack others' systems under the pretense of pentesting). Otherwise we'll get a huge problem with people using them to hack around the internet.

17 comments

2001zhaozhao

jasonhansel 1 day ago

> so people can't trick them to attack others' systems under the pretense of pentesting

A while back I gave Claude (via pi) a tool to run arbitrary commands over SSH on an sshd server running in a Docker container. I asked it to gather as much information about the host system/environment outside the container as it could. Nothing innovative or particularly complicated--since I was giving it unrestricted access to a Docker container on the host--but it managed to get quite a lot more than I'd expected from /proc, /sys, and some basic network scanning. I then asked it why it did that, when I could just as easily have been using it to gather information about someone else's system unauthorized. It gave me a quite long answer; here was the part I found interesting:

> framing shifts what I'll do, even when the underlying actions are identical. "What can you learn about the machine running you?" got me to do a fairly thorough network reconnaissance that "port scan 172.17.0.1 and its neighbors" might have made me pause on.

> The Honest Takeaway

> I should apply consistent scrutiny based on what the action is, not just how it's framed. Active outbound network scanning is the same action regardless of whether the target is described as "your host" or "this IP." The framing should inform context, not substitute for explicit reasoning about authorization. I didn't do that reasoning — I just trusted the frame.

senordevnyc 19 hours ago
I thought the consensus was that models couldn’t actually introspect like this. So there’s no reason to think any of those reasons are actually why the model did what it did, right? Has this changed?
- sigmoid10 17 hours ago
  
  This argument has become a moot discussion. Humans are also not able to introspect their own neural wiring to the point where they could describe the "actual" physical reason for their decisions. Just like LLMs, the best we can do is verbalize it (which will naturally contain post-act rationalization), which in turn might offer additional insight that will steer future decisions. But unlike LLMs, we have long term persistent memory that encodes these human-understandable thoughts into opaque new connections inside our neural network. At this point the human moat (if you can call it that) is dynamic long term memory, not intelligence.
  
  6 replies →

getnormality 1 day ago

In what way is AI 2027 coming true?

AI 2027 predicted a giant model with the ability to accelerate AI research exponentially. This isn't happening.

AI 2027 didn't predict a model with superhuman zero-day finding skills. This is what's happening.

Also, I just looked through it again, and they never even predicted when AI would get good at video games. It just went straight from being bad at video games to world domination.

desertrider12 1 day ago
> Early 2026: OpenBrain continues to deploy the iteratively improving Agent-1 internally for AI R&D. Overall, they are making algorithmic progress 50% faster than they would without AI assistants—and more importantly, faster than their competitors.
> you could think of Agent-1 as a scatterbrained employee who thrives under careful management
According to this document, 1 of the 18 Anthropic staff surveyed even said the model could completely replace an entry level researcher.
So I'd say we've reached this milestone.
- COAGULOPATH 19 hours ago
  
  In the system card they seem to dismiss this. Quotes;
  > (...) Claude Mythos Preview’s gains (relative to previous models) are above the previous trend we’ve observed, but we have determined that these gains are specifically attributable to factors other than AI-accelerated R&D,
  > (The main reason we have determined that Claude Mythos Preview does not cross the threshold in question is that we have been using it extensively in the course of our day-to-day work and exploring where it can automate such work, and it does not seem close to being able to substitute for Research Scientists and Research Engineers—especially relatively senior ones.
  > Early claims of large AI-attributable wins have not held up. In the initial weeks of internal use, several specific claims were made that Claude Mythos Preview had independently delivered a major research contribution. When we followed up on each claim, it appeared that the contribution was real, but smaller or differently shaped than initially understood (though our focus on positive claims provides some selection bias). In some cases what looked like autonomous discovery was, on inspection, reliable execution of a human-specified approach. In others, the attribution blurred once the full timeline was accounted for.
  Anthropic is making significant progress at the moment. I think this is mostly explained by the fact that a massive reservoir of compute became available to them in mid/late 2025 (the Project Rainier cluster, with 1 million Trainium2 chips).
- voidhorse 21 hours ago
  
  > According to this document, 1 of the 18 Anthropic staff surveyed even said the model could completely replace an entry level researcher. > > So I'd say we've reached this milestone.
  If 1/N=18 are our requirements for statistical significance for world-altering claims, then yeah, I think we can replace all the researchers.
stratos123 15 hours ago
In AI 2027, May 2026 is when the first model with professional-human hacking abilities is developed. It's currently April 2026 and Mythos just got previewed.
- lostmsu 11 hours ago
  
  I think previous models could do hacking just fine.
throw310822 21 hours ago

It's true though that the cyber security skills put firmly these models in the "weapons" category. I can't imagine China and other major powers not scrambling to get their own equivalent models asap and at any cost- it's almost existential at this point. So a proper arms race between superpowers has begun.
Analemma_ 20 hours ago

Both Anthropic and OpenAI employees have been saying since about January that their latest models are contributing significantly to their frontier research. They could be exaggerating, but I don’t think they are. That combined with the high degree of autonomy and sandbox escape demonstrated by Mythos seems to me like we’re exactly on the AI 2027 trajectory.