Comment by pjerem

21 hours ago

What gp wanted to say is that models are now so smart and useful that even if they managed to be EVEN MORE smart and useful, you wouldn't even notice it.

Honestly, there is nothing in my head that Claude cannot handle. Maybe it can be more this or that but I can already barely exploit Opus 4.7.

And I'm using DeepSeek 4 Pro for my personal use and while it's a little behind, it's not that far.

I think the situation can be very dangerous for US AI companies because if current models are already capable of doing mostly anything, nobodoy will want to get to the next model, even if it's 10x better. OTOH, open source models like DeepSeek are doing mostly the same work for 1/10 of the price.

Also the more I play with Pi, the more I think LLMs are already not kept back by their own capabilities but by the lack of agency we allow them to have. There is more value today in a capable harness for current LLMs than in a better LLM.

10 comments

pjerem

suttontom 21 hours ago

Are you joking? Is there literally "nothing" you can imagine that Claude can't do?

tjwebbnorfolk 1 hour ago

Not OP, but in 6 months of using Opus I haven't yet found anything that I know how to do but it does not. On the contrary -- it can do things instantly that I would have needed a ~week refresher on some SDK or some algorithm in order to implement myself--plus a ton of thrash/debugging time.
What have YOU thought of that Claude can't do?
dead_internet 20 hours ago

[dead]

coldtea 20 hours ago

>What gp wanted to say is that models are now so smart and useful that even if they managed to be EVEN MORE smart and useful, you wouldn't even notice it.

I think what gp said was the improvements are incremental, and we haven't seen a big revolutionary change since 2-3 years, and the pace is slowing down.

czl 18 hours ago

> What gp wanted to say is that models are now so smart and useful that even if they managed to be EVEN MORE smart and useful, you wouldn't even notice it.

If benchmarks across the board keep trending up and you still don't notice a difference, that's not evidence the model stopped improving. More likely your tasks aren't hard enough to expose the gains, or the model has passed the point where you're able to judge it.

You can only tell a good answer from a great one up to your own ceiling. Once the model clears that, both look the same to you, and the extra capability is real whether or not you can see it.

pjerem 18 hours ago

But that’s exactly what I said ! I know the model will continue to improve and I don’t deny that, I even strongly believe it. My point is that at that point it probably won’t change anything to me.
Would Opus 10 release tomorrow and be nearly AGI, I still would still use it like 4.7 because on daily use, I am the limit (also the harness).
So as a customer paying for tokens, I’m probably going to search for better cost rather than more intelligence.

dzhiurgis 15 hours ago

> Honestly, there is nothing in my head that Claude cannot handle

Friend does marine autopilots in C++ on 64kb of memory. It's totally useless for him.

From my experience any sort of more difficult backend logic - all LLMs fail pretty quick. Especially when you need to logically work out the business logic (partly if not mostly because it just doesn't have the context you do).

claytongulick 20 hours ago

> Honestly, there is nothing in my head that Claude cannot handle.

One idea is that maybe it could figure out how many L's are in the word "google" [1]

Or, maybe which days of the week have a "d" in their spelling [2].

[1] https://x.com/FatherPhi/status/2059659658428912040?s=20

[2] https://x.com/FatherPhi/status/2054212816069132461?s=20

speff 18 hours ago

From what I understand, that's a problem with the way it receives data. The model doesn't see the letters g,o,o,g,l,e to count it. Just like how I can't sense radio waves. If I wanted to find that out, I'd get a tool to detect waves. If the LLM wants to find that out, it can write a script to find it.
CamperBob2 18 hours ago

Wow, which Claude model flubbed that question? Certainly not anything recent...? The 2-bit quant of K2.6 running locally on my own hardware has no problem with it: https://i.imgur.com/tL0FLjZ.png
So Claude has no excuses here.
Edit: even Qwen 3.6 27B handles it ( https://i.imgur.com/jleJxj2.png ), and of course Claude does. I had to go all the way back to Opus 3 to get it to fail (https://i.imgur.com/uJOH2nP.png).