← Back to context

Comment by nashashmi

5 days ago

I heard or read that the LLM translation system is trained upon Bible translations because the Bible has been translated into more languages than any other book.

this is an important question. I am not a specialist in this area. I believe what you say was true before the invention of the Transformer ML architecture around 2015. I believe that among practitioners close to the Transformer effort, they passed around "The Common Crawl" because it was standard and basically one file chunk. I suspect that Bible material was just one part of CommonCrawl. more info welcome