Comment by altern8
7 days ago
Every time I read these titles, I wonder if people are for some reason pushing the narrative that Claude is way smarter than it really is, or if I'm using it wrong.
They want me to code AI-first, and the amount of hallucinations and weird bugs and inconsistencies that Claude produces is massive.
Lots of code that it pushes would NOT have passed a human/human code review 6 months ago.
Apart from obvious PR (if you would need to lean into AI wave a bit this of all places is it) and fanboyism which is just part of human nature, why can't both be true?
It can properly excel in some things while being less than helpful in others. These are computers from the beginning, 1000x rehashed and now with an extra twist.
It's always the inconsistencies which amaze me, from the article:
> I have so many bugs in the Linux kernel that I can’t report because I haven’t validated them yet
You have "so many?" Are they uncountable for some reason? You "haven't validated" them? How long does that take?
> found a total of five Linux vulnerabilities
And how much did it cost you in compute time to find those 5?
These articles are always fantastically light on the details which would make their case for them. Instead it's always breathless prognostication. I'm deeply suspicious of this.
>And how much did it cost you in compute time to find those 5?
This is the last thing I'd worry about if the bug is serious in any way. You have attackers like nation states that will have huge budgets to rip your software apart with AI and exploit your users.
Also there have been a number of detailed articles about AI security findings recently.
Yeah, this was one of my first thoughts too. It’s impossible to know but I wonder how many of these “unknown exploits” have been in use by government agencies for years already. Or decades, apparently.
I'd be interested in how it compares (in terms of time, money and false positives) with fuzzing.
You are suspicious because you probably haven't worked anywhere that's AI-first. Anyone that's worked at a modern tech company will find this absolutely believable.
Like what, you expect Nicholas to test each vuln when he has more important work to do (ie his actual job?)
What models are you using, on what type of codebases, with what tools?
Not OC, but I tried OpenCode with Gemini, Claude and Kimi, and all of them were completely unable to solve any non-trivial problems which are not easily solved with some existing algorithm.
I understand how people use those tools if all they do is build CRUD endpoints and UIs for those endpoints (which is admittedly what most programmers probably do for their job). But for anything that requires any sort of problem solving skills, I don't understand how people use them. I feel like I live in a completely different world from some of the people who push agentic coding.
I'm using Claude Code with the latest version of Sonnet, using the official VS Code extension.
At my company they set it up that way.