← Back to context Comment by Oras 3 months ago Hard time? What value does adult videos description, views and comments add to small (7,32B) models? 3 comments Oras Reply andy99 3 months ago It says it’s common crawl, I interpret it to mean this is a generic web scrape dataset, presumably they filter stuff out they don’t want before pretraining. You’d have to do do some ablation testing to know what value it adds ccgreg 3 months ago Common Crawl is a particular dataset. commoncrawl.org khimaros 3 months ago what if that's where they learned how to utilize the double entendre? hard times indeed.
andy99 3 months ago It says it’s common crawl, I interpret it to mean this is a generic web scrape dataset, presumably they filter stuff out they don’t want before pretraining. You’d have to do do some ablation testing to know what value it adds ccgreg 3 months ago Common Crawl is a particular dataset. commoncrawl.org
khimaros 3 months ago what if that's where they learned how to utilize the double entendre? hard times indeed.
It says it’s common crawl, I interpret it to mean this is a generic web scrape dataset, presumably they filter stuff out they don’t want before pretraining. You’d have to do do some ablation testing to know what value it adds
Common Crawl is a particular dataset. commoncrawl.org
what if that's where they learned how to utilize the double entendre? hard times indeed.