Comment by gnerd00

2 days ago

this is an important question. I am not a specialist in this area. I believe what you say was true before the invention of the Transformer ML architecture around 2015. I believe that among practitioners close to the Transformer effort, they passed around "The Common Crawl" because it was standard and basically one file chunk. I suspect that Bible material was just one part of CommonCrawl. more info welcome