Comment by mjr00
3 days ago
> My intuition here is that this study mainly demonstrated that the learning curve on AI-assisted development is high enough that asking developers to bake it into their existing workflows reduces their performance while they climb that learing curve.
Definitely. Effective LLM usage is not as straightforward as people believe. Two big things I see a lot of developers do when they share chats:
1. Talk to the LLM like a human. Remember when internet search first came out, and people were literally "Asking Jeeves" in full natural language? Eventually people learned that you don't need to type, "What is the current weather in San Francisco?" because "san francisco weather" gave you the same, or better, results. Now we've come full circle and people talk to LLMs like humans again; not out of any advanced prompt engineering, but just because it's so anthropomorphized it feels natural. But I can assure you that "pandas count unique values column 'Foo'" is just as effective an LLM prompt as "Using pandas, how do I get the count of unique values in the column named 'Foo'?" The LLM is also not insulted by you talking to it like this.
2. Don't know when to stop using the LLM. Rather than let the LLM take you 80% of the way there and then handle the remaining 20% "manually", they'll keep trying to prompt to get the LLM to generate what they want. Sometimes this works, but often it's just a waste of time and it's far more efficient to just take the LLM output and adjust it manually.
Much like so-called Google-fu, LLM usage is a skill and people who don't know what they're doing are going to get substandard results.
> Rather than let the LLM take you 80% of the way there and then handle the remaining 20% "manually"
IMO 80% is way too much, LLMs are probably good for things that are not your domain knowledge and you can efford to not be 100% correct, like rendering the Mandelbrot set, simple functions like that.
LLMs are not deterministic sometimes they produce correct code and other times they produce wrong code. This means one has to audit LLM generated code and auditing code takes more effort than writing it, especially if you are not the original author of the code being audited.
Code has to be 100% deterministic. As programmers we write code, detailed instructions for the computer (CPU), we have developed allot of tools such as Unit Tests to make sure the computer does exactly what we wrote.
A codebase has allot of context that you gain by writing the code, some things just look wrong and you know exactly why because you wrote the code, there is also allot of context that you should keep in your head as you write the code, context that you miss from simply prompting an LLM.
> Effective LLM usage is not as straightforward as people believe
It is not as straightforward as people are told to believe!
^ this, so much this. The amount of bullshit that gets shoveled into hacker news threads about the supposed capabilities of these models is epic.
"But I can assure you that "pandas count unique values column 'Foo'" is just as effective an LLM prompt as "Using pandas, how do I get the count of unique values in the column named 'Foo'?""
How can you be so sure? Did you compare in a systematic way or read papers by people who did it?
Now I surely get results giving the llm only snippets and keywords, but anything complex, I do notice differences the way I articulate. Not claiming there is a significant difference, but it seems to me this way.
> How can you be so sure? Did you compare in a systematic way or read papers by people who did it?
No, but I didn't need to read scientific papers to figure how to use Google effectively, either. I'm just using a results-based analysis after a lot of LLM usage.
Well, I did needed some tutorials to use google efficently in the old days when + meant something specific.
Other people don't have benefit of your experience, though, so there's a communications gap here: this boils down to "trust me, bro."
How do we get beyond that?
2 replies →
> But I can assure you that "pandas count unique values column 'Foo'" is just as effective an LLM prompt as "Using pandas, how do I get the count of unique values in the column named 'Foo'?"
While the results are going to be similar, typing a question in full can help you think about it yourself too, as if the LLM is a rubber duck that can respond back.
I've found myself adjusting and rewriting prompts during the process of writing them before i ask the LLM anything because as i was writing the prompt i was thinking about the problem simultaneously.
Of course for simple queries like "write me a function in C that calculates the length of a 3d vector using vec3 for type" you can write it like "c function vec3 length 3d" or something like that instead and the LLM will give more or less the same response (tried it with Devstral).
But TBH to me that sounds like programmers using Vim claiming they're more productive than users of other editors because they have to use less keystrokes.
> Talk to the LLM like a human
Maybe the LLM doesn't strictly need it, but typing out does bring some clarity for the asker. I've found it helps a lot to catch myself - what am I even wanting from this?
I'm not sure about your example about talking to LLMs. There is good reason to think that speaking to it like a human might produce better results, as that's what most of the training data is composed of.
I don't have any studies, but it eems to me reasonable to assume.
(Unlike google, where presumably it actually used keywords anyway)
> I'm not sure about your example about talking to LLMs. There is good reason to think that speaking to it like a human might produce better results, as that's what most of the training data is composed of.
In practice I have not had any issues getting information out of an LLM when speaking to them like a computer, rather than a human. At least not for factual or code-related information; I'm not sure how it impacts responses for e.g. creative writing, but that's not what I'm using them for anyway.