Comment by bretbernhoft
3 years ago
I'm really excited to see another link to The Gutenberg Project. They do amazing work. And if their download stats are accurate, they must be one of the largest libraries of PDFs on the public Internet.
3 years ago
I'm really excited to see another link to The Gutenberg Project. They do amazing work. And if their download stats are accurate, they must be one of the largest libraries of PDFs on the public Internet.
> excited to see another link to [Project Gutenberg] ... they ...
It is a bit different. They started it, they started it all. The Project started in 1971, when Michael S. Hart as a student was offered computer use and realized it was a good idea to digitize texts, for preservation and distribution.
Project Gutenberg is one of the foremost Cultural Preservation and Promotion Projects of the XX Century.
A while ago, I was helping a student collect 19th century texts for corpus analysis. Since the books were out of copyright, PDFs were downloadable from Google and the Internet Archive. Although the scanned versions from the two sources were equivalent, the OCRed versions were very different. The OCRed texts from Google had a very low error rate and could be easily corrected by hand. The ones from IA were unusable, with many extreme typos on every line.