Comment by vanuatu
13 days ago
I don't think its much of an issue
- Rl envs + synthetic data + human annotated
- Usage data from codex/claude code/cursor
Most of the model abilities in coding come from post-training, not pretraining
13 days ago
I don't think its much of an issue
- Rl envs + synthetic data + human annotated
- Usage data from codex/claude code/cursor
Most of the model abilities in coding come from post-training, not pretraining
A better question is what's left for those who don't have access to that. We went from publicly available to vacuumed from private users
Open source models
unfortunately all the incentives right now are for repos to be private
Open source models are for rich people: only they can afford the hardware needed to run them.