Comment by gnerd00
2 days ago
this is an important question. I am not a specialist in this area. I believe what you say was true before the invention of the Transformer ML architecture around 2015. I believe that among practitioners close to the Transformer effort, they passed around "The Common Crawl" because it was standard and basically one file chunk. I suspect that Bible material was just one part of CommonCrawl. more info welcome
No comments yet
Contribute on Hacker News ↗