Comment by oliver236

1 day ago

isn't this insane? why aren't people freaking out? the jump in capability is outrageous. anyone?

60 comments

oliver236

HarHarVeryFunny 1 day ago

If it's so great at software engineering and bug fixing, then why does Claude Code still have 5000+ open bugs?

https://github.com/anthropics/claude-code/issues?q=is%3Aissu...

Apparently whatever SWE-bench is measuring isn't very relevant.

anuramat 20 hours ago
as much as I hate cc, 95% of the issues there are either AI psychosis or user error
- iLoveOncall 15 hours ago
  
  So it should be insanely easy for this world altering model to comb through them and close irrelevant ones.
  
  3 replies →
- HarHarVeryFunny 12 hours ago
  
  So "only" 250 real bugs?
tripledry 17 hours ago

Also, why is Anthropic still hiring SWEs?
FergusArgyll 1 day ago
Probably because a human still has to review every change and they don't have time
- HarHarVeryFunny 1 day ago
  
  So if all the AI code is being reviewed by humans (not sure this is true, but let's assume it is), then why are there 5000+ bugs? Are you blaming the Anthropic developers rather than the AI?

Eufrat 1 day ago

Anthropic needs to show that its models continually get better. If the model showed minimal to no improvement, it would cause significant damage to their valuation. We have no way of validating any of this, there are no independent researchers that can back any of the assertions made by Anthropic.

I don’t doubt they have found interesting security holes, the question is how they actually found them.

This System Card is just a sales whitepaper and just confirms what that “leak” from a week or so ago implied.

mirsadm 1 day ago
The numbers only go up to 100% though.
- neolefty 1 day ago
  
  Many numbers already have! That's why we keep coming up with new, harder, benchmarks.
xvector 19 hours ago

Most big tech companies have access to the model, you can absolutely "validate their claims" or talk to someone that can.
HDThoreaun 21 hours ago

Well they said theyll be giving the model to select tech companies to use, there soon will be independent users who can comment on its capabilities.

RivieraKid 1 day ago

I've been increasingly "freaking out" since about 3 - 4 years ago and it seems that the pessimistic scenario is materializing. It looks like it will be over for software engineers in a not so distant future. In January 2025 I said that I expect software engineers to be replaced in 2 years (pessimistic) to 5 years (optimistic). Right now I'm guessing 1 to 3 years.

sekai 17 hours ago
> I've been increasingly "freaking out" since about 3 - 4 years ago and it seems that the pessimistic scenario is materializing. It looks like it will be over for software engineers in a not so distant future. In January 2025 I said that I expect software engineers to be replaced in 2 years (pessimistic) to 5 years (optimistic). Right now I'm guessing 1 to 3 years.
Tell me how this will replace Jira, planning, convincing PM's about viability. Programming is only a part of the job devs are doing.
AI psychosis is truly next level in these threads.
- AstroBen 7 hours ago
  
  > Programming is only a part of the job devs are doing.
  Programming is a huge part of the job. In a world where AI does the programming we're going to need 80% fewer software professionals.
  It won't be a full replacement of the role, you're correct there - but it'll be a major downsizing because of productivity gains.
- ryeights 5 hours ago
  
  If the "new software engineering" is Jira, planning, and convincing PM's about viability all day, you can count me out!
- stavros 11 hours ago
  
  Have you never filed JIRA tickets, planned, or debated viability with an AI? Which part of those are you finding that an AI absolutely cannot do better than the average developer?
anuramat 20 hours ago

it's not gonna get much more autonomous without self play and major change in architecture
kypro 1 day ago
I assure you it will soon become very clear that mass job losses are one of the least concerning side effects of developing the magic "everything that can plausibly been done within the constraints of physics is now possible" machine.
We're opening a can of worms which I don't think most people have the imagination to understand the horrors of.
- jasondigitized 21 hours ago
  
  What is the opposite of horrors and why don't we talk about those ever.
- ls612 1 day ago
  
  While I'm definitely concerned that AI is a massive driver of centralization of power, at least in theory being able to do far more things in the space of "things physics admits to be possible" is massively wealth enhancing. That is literally how we have gotten from the pre-industrial world to today.
  
  2 replies →
- ash_091 1 day ago
  
  Do you have any sources I could read to better understand your concern?
  
  6 replies →
- MattRix 1 day ago
  
  yeesh yep, though it's more Pandora's Box than a can of worms, since it can't exactly be closed once it's opened

nsingh2 1 day ago

It's going to be expensive to serve (also not generally available), considering they said it's the largest model they've ever trained.

I suspect it's going to be used to train/distill lighter models. The exciting part for me is the improvement in those lighter models.

AstroBen 1 day ago

It seems inevitable that costs will come down over time. Expensive models today will be cheap models in a few years.
azan_ 1 day ago

What's interesting is that scaling appears to continue to pay off. Gwern was right - as always.

nozzlegear 1 day ago

Freak out about what? I read the announcement and thought "that's a dumb name, they sure are full of themselves" – then I went back to using Claude as a glorified commit message writer. For all its supposed leaps, AI hasn't affected my life much in the real except to make HN stories more predictable.

oliver236 1 day ago

LOL!

yrds96 1 day ago

I think there's no SOA advance on this one worthy of "freaking out".

Looks like they just built a way larger model, with the same quirks than Claude 4. Seems like a super expensive "Claude 4.7" model.

I have no doubts that Google and OpenAI already done that for internal (or even government) usage.

mofeien 1 day ago

I am freaking out. The world is going to get very messy extremely quickly in one or two further jumps in capability like this.

RivieraKid 1 day ago
Messy in a way that would affect you?
- mofeien 9 hours ago
  
  I can think of several possible messy outcomes that would be able to directly affect me, not all mutually exclusive:
  - Job loss by me being replaced by an AI or by somebody using an AI. Or by an AI using an AI.
  - Resulting societal instability once blue collar jobs get fully automated at scale, and there is no plan in place to replace this loss of peoples' livelihoods.
  - People turning to AI models instead of friends for emotional support, loss of human connection.
  - Erosion of democracy by making authoritarianism and control very scalable, broad in-detail population surveillance and automated investigation using LLMs that was previously bounded by manpower.
  - Autonomous weapons, "Slaughterbots" as in the short film from 2017
  - Biorisk through dangerous biological capabilities that enable a smaller team of less skilled terrorists to use a jailbroken LLM to create something dangerous.
  - Other powers in the world deciding that this technology is too powerful in the hands of the US, or too dangerous to be built at all and has to be stopped by all means.
  - Loss of/Voluntary ceding of control over something much smarter than us. "If Anyone Build It, Everyone Dies"
- RALaBarge 1 day ago
  
  Exploits in embedded systems that will never be properly updated is just one thing I can think of if one really thought about it.
- thunderfork 1 day ago
  
  "Internet no longer viable" would affect everyone, probably
  
  1 reply →

anuramat 1 day ago

"some model I don't get to use is much better at benchmarks"

pick one or more: comically huge model, test time scaling at 10e12W, benchmark overfit

estearum 1 day ago
So... you're not excited because it might take a few months before we can use it or something? I don't get your comment.
- RivieraKid 1 day ago
  
  Whether you're excited depends on what do you do for living and how close you are to financial independence.
  
  1 reply →
- randomgermanguy 1 day ago
  
  I think the general question is if they'll release it at all, haven't yet read anything stating that they would
  
  2 replies →
- anuramat 20 hours ago
  
  I'm not excited because they might be ~lying

RobertDeNiro 1 day ago

Well for one, it’s a PDF

dysoco 1 day ago

Wait until you see real usage. Benchmark numbers do not necessarily translate to real world performance (at least not by the same amount).

ryeights 1 day ago

Until recently I would have described myself as an AI skeptic. HN has been a great source for cope on the AI subject over the years. You can find nitpicks, caveats, all sorts of reasons to believe things aren’t as significant as they seem. For me Opus 4.5 was the inflection point where I started to think “maybe this isn’t a bubble.” The figures in this report, if accurate, are terrifying.

m3kw9 9 hours ago

have you used it once?

risyachka 1 day ago

the time to freak out was 2 years ago.