Comment by monsieurbanana
13 hours ago
> LLMs produce results on par with what I would expect out of a solid junior developer
This is a common take but it hasn't been my experience. LLMs produce results that vary from expert all the way to slightly better than markov chains. The average result might be equal to a junior developer, and the worst case doesn't happen that often, but the fact that it happens from time to time makes it completely unreliable for a lot of tasks.
Junior developers are much more consistent. Sure, you will find the occasional developer that would delete the test file rather than fixing the tests, but either they will learn their lesson after seeing your wth face or you can fire them. Can't do that with llms.
I think any further discussion about quality just needs to have the following metadata:
- Language
- Total LOC
- Subject matter expertise required
- Total dependency chain
- Subjective score (audited randomly)
And we can start doing some analysis. Otherwise we're pissing into ten kinds of winds.
My own subjective experience is earth shattering at webapps in html and css (because I'm terrible and slow at it), and annoyingly good but a bit wrong usually in planning and optimization in rust and horribly lost at systems design or debugging a reasonably large rust system.
I agree in that these discussions (this whole hn thread tbh) are seriously lacking in concrete examples to be more than holy wars 3.0.
Besides one point: junior developers can learn from their egregious mistakes, llms can't no matter how strongly worded you are in their system prompt.
In a functional work environment, you will build trust with your coworkers little by little. The pale equivalent in LLMs is improving system prompts and writing more and more ai directives that might or might not be followed.
This seems to be one of the huge weaknesses of current LLMs: Despite the words "intelligence" and "machine learning" we throw around, they aren't really able to learn and improve their skills without someone changing the model. So, they repeat the same mistakes and invent new mistakes by random chance.
If I was tutoring a junior developer, and he accidentally deleted the whole source tree or something egregious, that would be a milestone learning point in his career, and he would never ever do it again. But if the LLM does it accidentally, it will be apologetic, but after the next context window clear, it has the same chances of doing it again.
> Besides one point: junior developers can learn from their egregious mistakes, llms can't no matter how strongly worded you are in their system prompt.
I think if you set off an LLM to do something, and it does a "egregious mistake" in the implementation, and then you adjust the system prompt to explicitly guard against that or go towards a different implementation and you restart from scratch again yet it does the exact same "egregious mistake", then you need to try a different model/tool than the one you've tried that with.
It's common with smaller models, or bigger models that are heavily quanitized that they aren't great at following system/developer prompts, but that really shouldn't happen with the available SOTA models, I haven't had something ignored like that in years by now.
1 reply →
I can in fact fire an LLM. It's even easier than firing a junior developer.
Or rather, it's more like a contractor. If I don't like the job they did, I don't give them the next job.