Comment by simonw
3 hours ago
That's why the major AI labs are really careful about the code they include in the training runs.
The days of indiscriminately scraping every scrap of code on the internet and pumping it all in are long gone, from what I can tell.
Well, if as the OP points out it is 'all garbage' they don't have a whole lot of choice to discriminate.
Do you have pointers to this?
Would be a great resource to understand what works and what doesn't.