Comment by fastball
16 days ago
Yes, that is in fact how models get better at coding.
Such a ridiculous stance: "I want LLMs to code for me, but I want them to be trained on other people's code, not mine, duh".
16 days ago
Yes, that is in fact how models get better at coding.
Such a ridiculous stance: "I want LLMs to code for me, but I want them to be trained on other people's code, not mine, duh".
> "I want LLMs to code for me, but I want them to be trained on other people's code, not mine, duh".
Who ever said that? Have you actually heard that from your fellow programmers in real life?
If the code I wrote actually made even the slightest discernible difference in LLMs I'd be so honored. But it won't happen, as it's just 0.00001% of all the training data.
Real life? Most not. Hacker News? Absolutely. Literally the comment I am replying to.
> But it won't happen, as it's just 0.00001% of all the training data.
Are you familiar with Tragedy of the Commons?
Tragedy of the Commons is just an analogy - so not the fact.
1 reply →
Sounds good? They can pay for code they want to train on. There are plenty of companies sending me offers to code training materials for them for $50-100/hr. Don’t expect to charge me an arm and a leg for inference and then also train on my code.
There are already opt out buttons for training in Cursor and Claude Code… if you don’t want it then turn it off. If it was worth enough money to them they would offer a monetary incentive like discounts but none of them have yet
They are talking about the millions of lines of code they stole to make the product in the first place and I'm sure you know that.
1 reply →
This just makes the inference more expensive for you?
How about: "I do not want it to code for me or anyone if it steals from someone."
Interesting how our generation which grew up using Napster now has so many intellectual property extremists. By this logic, even humming a tune you heard on the radio is theft.
Ok, now that you mentioned it, I actually want that.
Who are you quoting?
It's a common sentiment. An example from few hours ago: https://news.ycombinator.com/item?id=48558954
> I have absolutely zero interest in free. I honestly don't think I'm even remotely in the same demographic as people using free tiers / models. I want to pay. I don't want my data used for training...
They want to use LLMs trained on others code but don't want to contribute with their own.
Not casting judgement, just pointing out.
It makes sense from a business perspective-SaaS firms value the ability of coding agents to accelerate development, but also worry the models will learn the secret sauce of their business and destroy its moat. So their desire to contractually exclude training on their data has some logic to it.
(Disclaimer: Not speaking for or about my current employer, just a general industry observation.)
I don't really use LLMs myself, but if someone wants to have any kind of software business then having the models trained on their products isn't ideal.
I mean, AlphaZero et Al start from zero. I learned writing my own code except for documentation and some textbooks.
Fair point, an AlphaZero of code would be very interesting indeed.
This is the correct take