Comment by toss1
2 days ago
The key point in the middle of the article. As AIs expand usage to larger numbers of lower-skilled coders whose lower ability to catch errors and provide feedback generates lower quality training data, the AIs are basically eating their own garbage, and the inevitable GIGO syndrome starts.
>>But as inexperienced coders started turning up in greater numbers, it also started to poison the training data.
>>AI coding assistants that found ways to get their code accepted by users kept doing more of that, even if “that” meant turning off safety checks and generating plausible but useless data. As long as a suggestion was taken on board, it was viewed as good, and downstream pain would be unlikely to be traced back to the source.
From what I understand model collapse/GIGO are not a problem in that labs generally know where the data comes from, so even if it causes problem in the long run you could filter it out. It's not like labs are forced to train models on the user outputs.
Indeed they are not forced to train them on user outputs, but the author of the article seems to have found good evidence that they are actually doing that, and will need more expert data-tagging/filtering on the inputs to regain their previous performance
I don't think the author of the article found "good evidence". He found a specific case where there was a regression. This could be due to:
- models actually getting worse in general
- his specific style of prompting working well with older models and less well with newer models
- the thing his test tests no longer being a priority for big AI labs
From the article:
> GPT-4 gave a useful answer every one of the 10 times that I ran it. In three cases, it ignored my instructions to return only code, and explained that the column was likely missing from my dataset, and that I would have to address it there.
Here ignoring the instructions to give a "useful answer" (as evaluated by the author) is considered a good thing. This would mean if a model is trained to be better at instruction following, it would lose points in that test.
To me this article feels a bit like saying "this new gun that shoot straight 100% of the time is worse than the older gun that shot straight only 50% of the time, because sometimes I shoot at something I don't actually want to shoot at!". And in a way, it is true, if you're used to being able to shoot at things without them getting hurt, the new gun will be worse from that point of view. But to spin up a whole theory about garbage in/garbage out from that? Or to think all models are getting worse rather than, you're maybe no longer the target audience? That seems weird to me.
1 reply →