Comment by raddan

3 days ago

I don’t know if a blanket answer is possible. I had the experience yesterday of asking for a simplification of a working (a computational geometry problem, to a first approximation) algorithm that I wrote. ChatGPT responded with what looked like a rather clever simplification that seemed to rely on some number theory hack I did not understand, so I asked it to explain it to me. It proceeded to demonstrate to itself that it was actually wrong, then it came up with two alternative algorithms that it also concluded were wrong, before deciding that my own algorithm was best. Then it proceeded to rewrite my program using the original flawed algorithm.

I later worked out a simpler version myself, on paper. It was kind of a waste of time. I tend not to ask for solutions from whole cloth anymore. It’s much better at giving me small in-context examples of API use, or finding handy functions in libraries, or pointing out corner cases.

13 comments

raddan

esperent 2 days ago

I think there's two different cases here that need to be treated carefully when working with AI:

1. Using a well know but complex algorithm that I don't remember fully. AI will know it and integrate it into my existing code faster (often much, much faster) than I could, and then I can review and confirm it's correct

2. Developing a new algorithm or at least novel application of an existing one, or using a complex algorithm in an unusual way. The AI will need a lot of guidance here, and often I'll regret asking it in the first place.

I haven't used Claude Code, however every time I've criticized AI in the past, there's always someone who will say "this tool released in the last month totally fixes everything!"... And so far they haven't been correct. But the tools are getting better, so maybe this time it's true.

$200 a month is a big ask though, completely out of reach for most people on earth (students, hobbyists, people from developing countries where it's close to a monthly wage) so I hope it doesn't become normalized.

somenameforme 2 days ago
> I haven't used Claude Code, however every time I've criticized AI in the past, there's always someone who will say "this tool released in the last month totally fixes everything!"... And so far they haven't been correct. But the tools are getting better, so maybe this time it's true.
The cascading error problem means this will probably never be true. Because LLMs are fundamentally guess the next token based on the previous tokens, whenever it gets a single token wrong - future tokens become even more likely to be wrong which snowballs to absurdity.
Extreme hallucination issues can probably eventually be resolved by giving it access to a compiler and, where appropriate, you could also probably feed it test cases, but I don't think the cascading errors will ever be able to be resolved. The best case scenario will eventually it being able to say 'I don't know how to achieve this.' Of course then you ruin the mystique of LLMs which think they can solve any problem.
- HeatrayEnjoyer 2 days ago
  
  It obviously can be resolved, otherwise we wouldn't be able to self-correct our own selves. When is unknown, but not the if.
  
  3 replies →

pjerem 3 days ago

You really can’t compare free "check my algorithm" ChatGPT with $200/month "generate a working product" Claude Code.

I’m not saying Claude Code is perfect or is the panacea but those are really different products with orders of magnitude of difference in capabilities.

OJFord 3 days ago
Claude 4? Or is Claude Code really so much better than say Aider also using Claude 4?
- sulam 3 days ago
  
  The scaffolding and system prompting around Claude 4 is really, really good. More importantly it’s advanced a lot in the last two months. I would definitely not make assumptions that things are equal without testing.
- phist_mcgee 3 days ago
  
  It's both Claude 4 Opus and the secret sauce that Claude Code has for UX (as well as Claude.md files for project/system rules and context) that is the killer I think. The describe, build, test cycle is very tight and produces consistently high quality results.
  Aider feels a little clunky in comparison, which is understandable for a free product.
  
  2 replies →
- 0x457 3 days ago
  
  That's pretty much impossible comparison to make. Workflow between two is very different, aider has way more toggles. I can tell you that Aider using sonnet-4 started Node.js library in otherwise rust project given the same prompt as claud code that did finish the task.