Comment by petesergeant
5 days ago
Randomly, my advice: don't sleep on this.
Three or four weeks ago I was posting how LLMs were useful for one-off questions but I wouldn't trust them on my codebase. Then I spent my week's holiday messing around on them for some personal projects. I am now a fairly committed Roo user. There are lots of problems, but there is incredible value here.
Try it and see if you're still a hold-out.
I spent a good part of yesterday attempting to use ChatGPT to help me choose an appropriate API gateway. Over and over it suggested things that literally do not exist, and the only reason I could tell was that I spent a good amount of time in the actual documentation. This has been my experience roughly 80% of the time when trying to use an LLM. I would like to know what is the magical prompt engineering technique that makes it stop confidently hallucinating about literally everything.
> I spent a good part of yesterday attempting to use ChatGPT to help me choose an appropriate API gateway.
If you mean the ChatGPT interface, I suspect you're headed in the wrong direction.
Try Aider, with API interface. You can use whatever model you like (as you're paying per token). See my other comment:
https://news.ycombinator.com/item?id=44259900
I mirror the GP's sentiment. My initial attempts using a chat like interface were poor. Then some months ago, due to many HN comments, I decided to give Aider a try. I had put my kid to bed and it was 10:45pm. My goal was "Let me just figure out how to install Aider and play with it for a few minutes - I'll do the real coding tomorrow." 15 minutes later, not only had I installed it, my script was done. There was one bug I had to fix myself. It was production quality code, too.
I was hooked. Even though I was done, I decided to add logging, command line arguments, etc. An hour later, it was a production grade script, with a very nice interface and excellent logging.
Oh, and this was a one-off script. I'll run it once and never again. Now all my one-off scripts have excellent logging, because it's almost free.
There was no going back. For small scripts that I've always wanted to write, AI is the way to go. That script had literally been in my head for years. It was not a challenging task - but it had always been low in my priority list. How many ideas do you have in your head that you'll never get around to because of lack of time. Well, now you can do 5x more of those than you would have without AI.
Just wanted to add to your post with my anecdote.
I was at the "script epiphany" stage a few months ago and I got cool Bash scripts (with far more bells and whistles I would normally implement) just by iterating with Claude via its web interface.
Right now I'm at the "Gemini (with Aider) is pretty good for knock-offs of the already existing functionality" stage (in a Go/HTMX codebase).
I'm yet to get to the "wow, that thing can add brand new functionality using code I'm happy with just by clever context management and prompting" stage; but I'm definitely looking forward to it.
I'm having a very good experience with ChatGPT at the moment. I'm mostly using it for little tasks where I don't remember the exact library functions. Examples:
"C++ question: how do I get the unqualified local system time and turn into an ISO time string?"
"Python question: how do I serialize a C struct over a TCP socket with asyncio?"
"JS question: how do I dynamically show/hide an HTML element?" (I obviously don't write a lot of JS :-D)
ChatGPT gave me the correct answers on the first try. I have been a sceptic, but I'm now totally sold on AI assisted coding, at least as a replacement for Google and StackOverflow. For me there is no point anymore in wading through all the blog spam and SEO crap just to find a piece of information. Stack Overflow is still occasionally useful, but the writing is on the wall...
EDIT: Important caveat: stay critical! I have been playing around asking ChatGPT more complex questions where I actually know the correct answer resp. where I can immediately spot mistakes. It sometimes gives me answers that would look correct to a non-expert, but are hilariously wrong.
The problem with this approach is that you might lose important context which is present in the documentation but doesn’t surface through the LLM. As an example, I just asked GPT-4o how to access Nth character in a string in Go. Predictably, it answered str[n]. This is a wildly dangerous suggestion because it works correctly for ASCII but not for other UTF8 characters. Sure, if you know about this and prompt it further it tells you about this limitation, but that’s not what 99% of people will do.
1 reply →
Sure, this was exactly how I felt three weeks ago, and I could have written that comment myself. The agentic approach where it works out it made something up by looking at the errors the type-check generates is what makes the difference.
Which model did you use?
I find using o3 or o4-mini and prompting "use your search tool" works great for having it perform research tasks like this.
I don't trust GPT-4o to run searches.
Did you use search grounding? O3 or o4-mini-high with search grounding (which will usually come on by default with questions like this) are usually the best option.
Did you try giving it the docs to read?
I will definitely sleep on agents. Normal LLM use, fine, but I am not giving up reasoning.
> Normal LLM use, fine, but I am not giving up reasoning.
Ouch! Reminds me of:
- I'm never going to use cell phones. I care about voice quality (me decades ago)
- I'm never going to use VoIP. I care about voice quality (everyone but me 2 decades ago).
- I'm never going to use a calculator. I am not going to give up on reasoning.
- I'm never going to let my kids play with <random other ethnicity>. I care about good manners.
https://en.wikipedia.org/wiki/False_dilemma
I'm never going to use the metaverse.
I'm never going to use use Blockchain.
I'm never going to use use NFTs.
Sure, keep slowly offloading more and more of your brain to technology. Until you won't be needed anymore.
1 reply →
this is kind of a weird position to take. you're the captain, you're the person reviewing the code the LLM (agent or not) generates, you're the one asking for the code you want, you're in charge of deciding how much effort to put in to things, and especially which things are most worth your effort.
all this agent stuff sounded stupid to me until I tried it out in the last few weeks, and personally, it's been great - I give a not-that-detailed explanation for what I want, point it at the existing code and get back a patch to review once I'm done making my coffee. sometimes it's fine to just apply, sometimes I don't like a variable name or whatever, sometimes it doesn't fit in with the other stuff so I get it to try again, sometimes (<< 10% of the time) it's crap. the experience is pretty much like being a senior dev with a bunch of very eager juniors who read very fast.
anyway, obviously do whatever you want, but deriding something you've not looked in to isn't a hugely thoughtful process for adapting to a changing world.
If I have to review all code code it's writing, I'd rather write it myself (maybe with the help of an LLM).
> anyway, obviously do whatever you want, but deriding something you've not looked in to isn't a hugely thoughtful process for adapting to a changing world.
I have tried it. Not sure I want to be part of such world, unfortunately.
> the experience is pretty much like being a senior dev with a bunch of very eager juniors who read very fast.
I... don't want that. Juniors just slow me down because I have to check what they did and fix their mistakes.
(this is in the context of professional software development, not making scripts, tinkering etc)
1 reply →
What's your definition of "agents" there?
[dead]