← Back to context

Comment by Quarrel

17 hours ago

I really like this as a suggestion, but getting opensource code that isn't in the LLMs training data is a challenge.

Then, with each model having a different training epoch, you end up with no useful comparison, to decide if new models are improving the situation. I don't doubt they are, just not sure this is a way to show it.

Yes, but perhaps the impact of being trained on code on being able to find bugs in code is not so large. You could do a bunch of experiments to find out. And this would be interesting in itself.