Comment by camgunz
2 days ago
Can you say more than literally "you're using it wrong"? Otherwise this is a no true scotsman (super common when LLM advocates are touting their newfound productivity). Here are my prompts, lightly redacted:
First prompt:
``` Build a new package at <path>. Use the <blah> package at <path> as an example. The new package should work like the <blah> package, but instead of receiving events over HTTP, it should receive events as JSON over a Google Pub/Sub topic. This is what one such event would look like:
{ /* some JSON */ } ```
My assumptions when I gave it the following prompt were wrong, but it didn't correct me (it actually does sometimes, so this isn't an unreasonable expectation):
``` The <method> method will only process a single message from the subscription. Modify it to continuously process any messages received from the subscription. ```
These next 2 didn't work:
``` The context object has no method WithCancel. Simply use the ctx argument to the method above. ```
``` There's no need to attach this to the <object> object; there's also no need for this field. Remove them. ```
At this point, I fix it myself and move on.
``` There's no need to use a waitgroup in <method>, or to have that field on <object>. Modify <method> to not use a waitgroup. ```
``` There's no need to run the logic in <object> inside an anonymous function on a goroutine. Remove that; we only need the code inside the for loop. ```
``` Using the <package> package at <path> as an example, add metrics and logging ```
This didn't work for esoteric reasons:
``` On line 122 you're casting ctx to <context>, but that's already its type from this method's parameters. Remove this case and the error handling for when it fails. ```
...but this fixed it though:
``` Assume that ctx here is just like the ctx from <package>, for example it already has a logger. ```
There were some really basic errors in the test code. I thought I would just ask it to fix them:
``` Fix the errors in the test code. ```
That made things worse, so I just told it exactly what I wanted:
``` <field1> and <field2> are integers, just use integers ```
I wouldn't call it a "conversation" per se, but this is essentially what I see Kenton Varda, Simon Willison, et al doing.
Yeah, that was a pretty lazy response on my part. Let me try again.
In my opinion, it takes several weeks of active use to nail down your preferred workflow with these tools and to get a meaningful understanding of their abilities and limitations.
I.e., yes they hallucinate and don't have great understanding of truth/fact (however you choose to define those terms), but you need to develop an intuition for how to work around those issues and how to recognize the problems in your setup that increase the likelihood of the LLM heading down false paths. This intuition cannot come until you fight through the initial struggle period.
In some ways, it's similar to picking up emacs/vim and learning the shortcuts. It's a negative to your velocity until it's not, and once you overcome that initial hurdle, your productivity takes off. Admittedly, it's not for everyone (I never bothered to learn the ins and outs of vim bindings because my bottleneck isn't my speed of writing code), but it provides a huge productivity boost for those types of engineers.
Coming back to my main point: your LLM needs quite a bit of guidance in the early stages, especially as you're feeling out what types of tasks it's able to knock out the park and what types of tasks it'll struggle with. For instance, in the example you gave here, I wonder what would happen if you asked it to present you a detailed plan before it gets to writing any code and to provide a list of assumptions it is making? You will need to do a bit of review with it before you let it go execute the plan (siilar to how a junior engineer would come to you with questions before being able to handle certain tasks).
I also recommend writing up a thorough self-review checklist that it stored in your repo (e.g. in an AGENTS.md file) that provides the customized instructions you want your LLM to follow (it won't always do so but it helps a ton). Otherwise, each new session is essentially starting over without it learning, which is pretty frustrating.
I'm happy to talk more because I'm pretty optimistic about LLMs and enjoy using them in my day-to-day where appropriate.
And finally, I'm not sure how much you've thought about giving them more autonomy, but I do recommend doing so if you have a safe, sandboxed environment. The real magic and productivity boost of LLMs come when you give them some more autonomy and provide them with tools to figure out the problems they encounter, unlocking your time to be spent on higher-leverage tasks such as designing systems and processes. If it can run linters, unit tests, and grep your codebase during its development process and use this to iterate, you'll have a much more fun time.
Does this help?
Thanks! Uh, it helps a little. I'll see about making an AGENTS.md. I guess I can agree that there's something about like, maximizing the amount of good output from an agent, and to do that you need to give it a lot of access and info. I thought I did that though by giving it the whole codebase and linking pubsub docs, maybe the job I gave it was too small, but that's hard for me to accept: I wanted an entirely new module, is the next step like, ask it to make all the modules? What if I only need one?