Comment by YetAnotherNick

2 years ago

Github full (public) scrape is available to anyone. GPT-4 was trained before Microsoft deal so I don't think it is because of Github access. And GPT-4 is significantly better in everything compared to second best model for that field, not just coding.

And there is no evidence that Github is violating any open source licenses.

So they are going to be training on exactly the same data that is available to all.