Comment by camgunz
2 days ago
I finally took the plunge and did a big chunk of work in Cursor. It was pretty ideal: greenfield but with a very relevant example to slightly modify (the example pulled events over HTTP as a server and I wanted it to pull events over Google pub/sub instead).
Over IDK, 2-3 hours I got something that seemed on its face to work, but:
- it didn't use the pub/sub API correctly
- the 1 low-coverage test it generated didn't even compile (Go)
- there were a bunch of small errors it got confused by--particularly around closures
I got it to "90%" (again though it didn't at all work) with the first prompt, and then over something like a dozen more mostly got it to fix its own errors. But:
- I didn't know the pub/sub API--I was relying on Cursor to do this correctly--and it totally submarined me
- I had to do all the digging to get the test to compile
- I had to go line by line and tell it to rewrite... almost everything
I quit when I realized I was spending more time prompting it to fix things than it would take me to fully engage my brain and fix them myself. I also noticed that there was a strong pull to "just do one more prompt" rather than dig in and actually understand things. That's super problematic to me.
Worse, this wasn't actually faster. How do I know that? The next day I did what I normally do: read docs and wrote it myself. I spent less time (I'm a fast typist and a Vim user) overall, and my code works. My experience matches pretty well w/ the results of TFA.
---
Something I will say though is there is a lot of garbage stuff in tech. Like, I don't want to learn Terraform (again) just to figure out how to deploy things to production w/o paying a Heroku-like premium. Maybe I don't want to look up recursive CTEs again, or C function pointers, or spent 2 weeks researching a heisenbug I put into code for some silly reason AI would have caught immediately. I am _confident_ we can solve these things without boiling oceans to get AI to do it for us.
But all this shit about how "I'm 20x more productive" is totally absurd. The only evidence we have of this is people just saying it. I don't think a 20x productivity increase is even imaginable. Overall productivity since 1950 is up 3.6x [0]. These people are asking us to believe they've achieved over 400 years of productivity gains in "3 months". Extraordinary claims require extraordinary evidence. My guess is either you were extremely unproductive before, or (like others are saying in the threads) in very small ways you're 20x more productive but most things are unaffected or even slower.
You're using it wrong -- it's intended to be a conversational experience. There are so many techniques you can utilize to improve the output while retaining the mental model of codebase.
Respectfully, this is user error.
Can you say more than literally "you're using it wrong"? Otherwise this is a no true scotsman (super common when LLM advocates are touting their newfound productivity). Here are my prompts, lightly redacted:
First prompt:
``` Build a new package at <path>. Use the <blah> package at <path> as an example. The new package should work like the <blah> package, but instead of receiving events over HTTP, it should receive events as JSON over a Google Pub/Sub topic. This is what one such event would look like:
{ /* some JSON */ } ```
My assumptions when I gave it the following prompt were wrong, but it didn't correct me (it actually does sometimes, so this isn't an unreasonable expectation):
``` The <method> method will only process a single message from the subscription. Modify it to continuously process any messages received from the subscription. ```
These next 2 didn't work:
``` The context object has no method WithCancel. Simply use the ctx argument to the method above. ```
``` There's no need to attach this to the <object> object; there's also no need for this field. Remove them. ```
At this point, I fix it myself and move on.
``` There's no need to use a waitgroup in <method>, or to have that field on <object>. Modify <method> to not use a waitgroup. ```
``` There's no need to run the logic in <object> inside an anonymous function on a goroutine. Remove that; we only need the code inside the for loop. ```
``` Using the <package> package at <path> as an example, add metrics and logging ```
This didn't work for esoteric reasons:
``` On line 122 you're casting ctx to <context>, but that's already its type from this method's parameters. Remove this case and the error handling for when it fails. ```
...but this fixed it though:
``` Assume that ctx here is just like the ctx from <package>, for example it already has a logger. ```
There were some really basic errors in the test code. I thought I would just ask it to fix them:
``` Fix the errors in the test code. ```
That made things worse, so I just told it exactly what I wanted:
``` <field1> and <field2> are integers, just use integers ```
I wouldn't call it a "conversation" per se, but this is essentially what I see Kenton Varda, Simon Willison, et al doing.
Yeah, that was a pretty lazy response on my part. Let me try again.
In my opinion, it takes several weeks of active use to nail down your preferred workflow with these tools and to get a meaningful understanding of their abilities and limitations.
I.e., yes they hallucinate and don't have great understanding of truth/fact (however you choose to define those terms), but you need to develop an intuition for how to work around those issues and how to recognize the problems in your setup that increase the likelihood of the LLM heading down false paths. This intuition cannot come until you fight through the initial struggle period.
In some ways, it's similar to picking up emacs/vim and learning the shortcuts. It's a negative to your velocity until it's not, and once you overcome that initial hurdle, your productivity takes off. Admittedly, it's not for everyone (I never bothered to learn the ins and outs of vim bindings because my bottleneck isn't my speed of writing code), but it provides a huge productivity boost for those types of engineers.
Coming back to my main point: your LLM needs quite a bit of guidance in the early stages, especially as you're feeling out what types of tasks it's able to knock out the park and what types of tasks it'll struggle with. For instance, in the example you gave here, I wonder what would happen if you asked it to present you a detailed plan before it gets to writing any code and to provide a list of assumptions it is making? You will need to do a bit of review with it before you let it go execute the plan (siilar to how a junior engineer would come to you with questions before being able to handle certain tasks).
I also recommend writing up a thorough self-review checklist that it stored in your repo (e.g. in an AGENTS.md file) that provides the customized instructions you want your LLM to follow (it won't always do so but it helps a ton). Otherwise, each new session is essentially starting over without it learning, which is pretty frustrating.
I'm happy to talk more because I'm pretty optimistic about LLMs and enjoy using them in my day-to-day where appropriate.
And finally, I'm not sure how much you've thought about giving them more autonomy, but I do recommend doing so if you have a safe, sandboxed environment. The real magic and productivity boost of LLMs come when you give them some more autonomy and provide them with tools to figure out the problems they encounter, unlocking your time to be spent on higher-leverage tasks such as designing systems and processes. If it can run linters, unit tests, and grep your codebase during its development process and use this to iterate, you'll have a much more fun time.
Does this help?
1 reply →