Comment by misiti3780
1 day ago
i dont know where you are working, but where I work i cant prompt 90% of my job away using cursor. in fact, I find all of these tools to be more and more useless and our codebase is growing and becoming more complex
based on the current state of AI and the progress im witnessing on a month-by-month basis - my current prediction is there is zero chance AI agents are going to be coding and replacing me in the next few years. if i could short the startups claiming this, I would.
Don't get distracted by claims that AI agents "replace programmers". Those are pure hype.
I'm willing to bet that in a few years most of the developers you know will be using LLMs on a daily basis, and will be more productive because of it (having learned how to use it).
this is already the case.
I have the same experience. It‘s basically a better StackOverflow, but just like with SO you have to be very careful about the replies, and also just like SO its utility diminishes as you get more proficient.
As an example, just today I was trying to debug some weird WebSocket behaviour. None of the AI tools could help, not Cursor, not plain old ChatGPT with lots of prompting and careful phrasing of the problem. In fact every LLM I tried (Claude 3.7, GPT o4-mini-high, GPT 4.5) introduced errors into my debugging code.
I’m not saying it will stay this way, just that it’s been my experience.
I still love these tools though. It’s just that I really don’t trust the output, but as inspiration they are phenomenal. Most of the time I just use vanilla ChatGPT though; never had that much luck with Cursor.
No one was forcing you to use SO, in fact we made fun of people who did copy-paste/compile-coding.
Yeah, they're currently horrible at debugging -- there seems to be blind spots they just can't get past so end up running in circles.
A couple days ago I was looking for something to do so gave Claude a paper ("A parsing machine for PEGs") to ask it some questions and instead of answering me it spit out an almost complete implementation. Intrigued, I threw a couple more papers at it ("A Simple Graph-Based Intermediate Representation" && "A Text Pattern-Matching Tool based on Parsing Expression Grammars") where it fleshed out the implementation and, well... color me impressed.
Now, the struggle begins as the thing has to be debugged. With the help of both Claude and Deepseek we got it compiling and passing 2 out of 3 tests which is where they both got stuck. Round and round we go until I, the human who's supposed to be doing no work, figured out that Claude hard coded some values (instead of coding a general solution for all input) which they both missed. In applying ever more and more complicated solutions (to a well solved problem in compiler design) Claude finally broke all debugging output and I don't understand the algorithms enough to go in and debug it myself.
Of course I didn't use any sort of source code management so I could revert to a previous version before it was broken beyond all fixing...
Honestly, I don't even consider this a failure. I learned a lot more on what they are capable of and now know that you have to give them problems in smaller sections where they don't have to figure out the complexities of how a few different algorithms interact with each other. With this new knowledge in hand I started on what I originally intended to do before I got distracted with Claude's code solution to a simple question.
--edit--
Oh, the irony...
After typing this out and making an espresso I figured out the problem Claude and Deepseek couldn't see. So much for the "superior" intelligence.
One of the ways these tools are most useful for me is in extremely complex codebases.
This has become especially true for me in the past four months. The new long context reasoning models are shockingly good at digging through larger volumes of gnarly code. o3, o4-mini and Claude 3.7 Sonnet "thinking" all have 200,000 token context limits, and Gemini 2.5 Pro and Flash can do 1,000,000. As "reasoning" models they are much better suited to following the chain of a program to figure out the source of an obscure bug.
Makes me wonder how many of the people who continue to argue that LLMs can't help with large existing codebases are missing that you need to selectively copy the right chunks of that code into the model to get good results.
But 1 million tokens is like 50k lines of code or something. That's only medium sized. How does that help with large complex codebases?
What tools are you guys using? Are there none that can interactively probe the project in a way that a human would, e.g. use code intelligence to go-to-definition, find all references and so on?
2 replies →