Comment by Quarrel
17 hours ago
I really like this as a suggestion, but getting opensource code that isn't in the LLMs training data is a challenge.
Then, with each model having a different training epoch, you end up with no useful comparison, to decide if new models are improving the situation. I don't doubt they are, just not sure this is a way to show it.
Yes, but perhaps the impact of being trained on code on being able to find bugs in code is not so large. You could do a bunch of experiments to find out. And this would be interesting in itself.