Comment by oliver236

1 day ago

isn't this insane? why aren't people freaking out? the jump in capability is outrageous. anyone?

If it's so great at software engineering and bug fixing, then why does Claude Code still have 5000+ open bugs?

https://github.com/anthropics/claude-code/issues?q=is%3Aissu...

Apparently whatever SWE-bench is measuring isn't very relevant.

Anthropic needs to show that its models continually get better. If the model showed minimal to no improvement, it would cause significant damage to their valuation. We have no way of validating any of this, there are no independent researchers that can back any of the assertions made by Anthropic.

I don’t doubt they have found interesting security holes, the question is how they actually found them.

This System Card is just a sales whitepaper and just confirms what that “leak” from a week or so ago implied.

  • Most big tech companies have access to the model, you can absolutely "validate their claims" or talk to someone that can.

  • Well they said theyll be giving the model to select tech companies to use, there soon will be independent users who can comment on its capabilities.

I've been increasingly "freaking out" since about 3 - 4 years ago and it seems that the pessimistic scenario is materializing. It looks like it will be over for software engineers in a not so distant future. In January 2025 I said that I expect software engineers to be replaced in 2 years (pessimistic) to 5 years (optimistic). Right now I'm guessing 1 to 3 years.

  • > I've been increasingly "freaking out" since about 3 - 4 years ago and it seems that the pessimistic scenario is materializing. It looks like it will be over for software engineers in a not so distant future. In January 2025 I said that I expect software engineers to be replaced in 2 years (pessimistic) to 5 years (optimistic). Right now I'm guessing 1 to 3 years.

    Tell me how this will replace Jira, planning, convincing PM's about viability. Programming is only a part of the job devs are doing.

    AI psychosis is truly next level in these threads.

    • > Programming is only a part of the job devs are doing.

      Programming is a huge part of the job. In a world where AI does the programming we're going to need 80% fewer software professionals.

      It won't be a full replacement of the role, you're correct there - but it'll be a major downsizing because of productivity gains.

    • If the "new software engineering" is Jira, planning, and convincing PM's about viability all day, you can count me out!

    • Have you never filed JIRA tickets, planned, or debated viability with an AI? Which part of those are you finding that an AI absolutely cannot do better than the average developer?

  • it's not gonna get much more autonomous without self play and major change in architecture

  • I assure you it will soon become very clear that mass job losses are one of the least concerning side effects of developing the magic "everything that can plausibly been done within the constraints of physics is now possible" machine.

    We're opening a can of worms which I don't think most people have the imagination to understand the horrors of.

    • While I'm definitely concerned that AI is a massive driver of centralization of power, at least in theory being able to do far more things in the space of "things physics admits to be possible" is massively wealth enhancing. That is literally how we have gotten from the pre-industrial world to today.

      2 replies →

    • yeesh yep, though it's more Pandora's Box than a can of worms, since it can't exactly be closed once it's opened

It's going to be expensive to serve (also not generally available), considering they said it's the largest model they've ever trained.

I suspect it's going to be used to train/distill lighter models. The exciting part for me is the improvement in those lighter models.

  • It seems inevitable that costs will come down over time. Expensive models today will be cheap models in a few years.

  • What's interesting is that scaling appears to continue to pay off. Gwern was right - as always.

Freak out about what? I read the announcement and thought "that's a dumb name, they sure are full of themselves" – then I went back to using Claude as a glorified commit message writer. For all its supposed leaps, AI hasn't affected my life much in the real except to make HN stories more predictable.

I think there's no SOA advance on this one worthy of "freaking out".

Looks like they just built a way larger model, with the same quirks than Claude 4. Seems like a super expensive "Claude 4.7" model.

I have no doubts that Google and OpenAI already done that for internal (or even government) usage.

I am freaking out. The world is going to get very messy extremely quickly in one or two further jumps in capability like this.

  • Messy in a way that would affect you?

    • I can think of several possible messy outcomes that would be able to directly affect me, not all mutually exclusive:

      - Job loss by me being replaced by an AI or by somebody using an AI. Or by an AI using an AI.

      - Resulting societal instability once blue collar jobs get fully automated at scale, and there is no plan in place to replace this loss of peoples' livelihoods.

      - People turning to AI models instead of friends for emotional support, loss of human connection.

      - Erosion of democracy by making authoritarianism and control very scalable, broad in-detail population surveillance and automated investigation using LLMs that was previously bounded by manpower.

      - Autonomous weapons, "Slaughterbots" as in the short film from 2017

      - Biorisk through dangerous biological capabilities that enable a smaller team of less skilled terrorists to use a jailbroken LLM to create something dangerous.

      - Other powers in the world deciding that this technology is too powerful in the hands of the US, or too dangerous to be built at all and has to be stopped by all means.

      - Loss of/Voluntary ceding of control over something much smarter than us. "If Anyone Build It, Everyone Dies"

    • Exploits in embedded systems that will never be properly updated is just one thing I can think of if one really thought about it.

"some model I don't get to use is much better at benchmarks"

pick one or more: comically huge model, test time scaling at 10e12W, benchmark overfit

Wait until you see real usage. Benchmark numbers do not necessarily translate to real world performance (at least not by the same amount).

Until recently I would have described myself as an AI skeptic. HN has been a great source for cope on the AI subject over the years. You can find nitpicks, caveats, all sorts of reasons to believe things aren’t as significant as they seem. For me Opus 4.5 was the inflection point where I started to think “maybe this isn’t a bubble.” The figures in this report, if accurate, are terrifying.