Comment by underlines
3 days ago
yes, every major llm company did it:
illegally using annas archive, the pile, common crawl, their own crawl, books2, libgen etc. and embed it into high dimensional space and do next token prediction on it.
3 days ago
yes, every major llm company did it:
illegally using annas archive, the pile, common crawl, their own crawl, books2, libgen etc. and embed it into high dimensional space and do next token prediction on it.
No comments yet
Contribute on Hacker News ↗