Comment by jvanderbot
9 hours ago
I think any further discussion about quality just needs to have the following metadata:
- Language
- Total LOC
- Subject matter expertise required
- Total dependency chain
- Subjective score (audited randomly)
And we can start doing some analysis. Otherwise we're pissing into ten kinds of winds.
My own subjective experience is earth shattering at webapps in html and css (because I'm terrible and slow at it), and annoyingly good but a bit wrong usually in planning and optimization in rust and horribly lost at systems design or debugging a reasonably large rust system.
I agree in that these discussions (this whole hn thread tbh) are seriously lacking in concrete examples to be more than holy wars 3.0.
Besides one point: junior developers can learn from their egregious mistakes, llms can't no matter how strongly worded you are in their system prompt.
In a functional work environment, you will build trust with your coworkers little by little. The pale equivalent in LLMs is improving system prompts and writing more and more ai directives that might or might not be followed.
This seems to be one of the huge weaknesses of current LLMs: Despite the words "intelligence" and "machine learning" we throw around, they aren't really able to learn and improve their skills without someone changing the model. So, they repeat the same mistakes and invent new mistakes by random chance.
If I was tutoring a junior developer, and he accidentally deleted the whole source tree or something egregious, that would be a milestone learning point in his career, and he would never ever do it again. But if the LLM does it accidentally, it will be apologetic, but after the next context window clear, it has the same chances of doing it again.
> Besides one point: junior developers can learn from their egregious mistakes, llms can't no matter how strongly worded you are in their system prompt.
I think if you set off an LLM to do something, and it does a "egregious mistake" in the implementation, and then you adjust the system prompt to explicitly guard against that or go towards a different implementation and you restart from scratch again yet it does the exact same "egregious mistake", then you need to try a different model/tool than the one you've tried that with.
It's common with smaller models, or bigger models that are heavily quanitized that they aren't great at following system/developer prompts, but that really shouldn't happen with the available SOTA models, I haven't had something ignored like that in years by now.
And honestly this is precisely why I don't fear unemployment, but I do fear less employment overall. I can learn and get better and use LLMs as a tool. So there's still a "me" there steering. Eventually this might not be the case. But if automating things has taught me anything, it's that removing the person is usually such a long tail cost that it's cheaper to keep someone in the loop.
But is this like steel production or piloting (few highly trained experts are in the loop) or more like warehouse work (lots of automation removed any skills like driving or inventory work etc).