Comment by throwup238
1 day ago
I use the Portal de Archivos Españoles [1] for Spanish colonial documents. Each country has their own archive but the Spanish one has the most content (35 million digitized pages)
The hard part is knowing where to look since most of the images haven’t gone through HRT/OCR or indexing so you have to understand Spanish colonial administration and go through the collections to find stuff.
Want to collab on a database and some clustering and analysis? I’m a data scientist at FAIR with an interest in antiquarian docs and books
Hit me up, if you can. I’m focused on neolatin texts from the renaissance. Less than 30% of known book editions have been scanned and less than 5% translated. And that’s before even getting to the manuscripts.
https://Ancientwisdomtrust.org
Also working on kids handwriting recognition for https://smartpaperapp.com
Sounds actually perfect. I’ll send you an email. Thank you!
1 reply →
Sadly I'm just an amateur armchair historian (at best) so I doubt I'd be of much help. I'm mostly only doing the translation for my own edification
You may be surprised (or not?) at how many important scientific and historical works are done by armchair practitioners.
You should maybe reach out to the author of this blog post, professor Mark Humphries. Or to the genealogy communities, we struggle with handwritten historical texts no public AI model can make a dent in, regularly.
Spaniard here. Let me know if I can somehow help navigate all of that. I’m very interested in history and everything related to the 1400-1500 period (although I’m not an expert by any definition) and I’d love to see what modern technology could do here, specially OCRs and VLMs.
Awesome thank you!