Comment by chii
4 days ago
And that is how it should be - the knowledge that the LLM trained on should be free, and cannot (and should never be) gatekept behind money.
It's merely the hardware that should be charged for - which ought to drop in price if/when the demand for it rises. However, this is a bottleneck at the moment, and hard to see how it gets resolved amidst the current US environment on sanctioning anyone who would try.
Is there no value in how the training was done such that it's accessible via inference in a particularly useful way?
That value is there, but google has decided to give it away as public knowledge (ala, their transformer paper).
And i would also argue that the researchers doing this are built on shoulders of other public knowledge - things funded by public institutions with taxpayer money.
No, a lot of the data they were trained on was pirated.