Comment by throwup238

3 months ago

I use the Portal de Archivos Españoles [1] for Spanish colonial documents. Each country has their own archive but the Spanish one has the most content (35 million digitized pages)

The hard part is knowing where to look since most of the images haven’t gone through HRT/OCR or indexing so you have to understand Spanish colonial administration and go through the collections to find stuff.

[1] https://pares.cultura.gob.es/pares/en/inicio.html

12 comments

throwup238

throwout4110 3 months ago

Want to collab on a database and some clustering and analysis? I’m a data scientist at FAIR with an interest in antiquarian docs and books

dr_dshiv 3 months ago
Hit me up, if you can. I’m focused on neolatin texts from the renaissance. Less than 30% of known book editions have been scanned and less than 5% translated. And that’s before even getting to the manuscripts.
https://Ancientwisdomtrust.org
Also working on kids handwriting recognition for https://smartpaperapp.com
- throwout4110 3 months ago
  
  Sounds actually perfect. I’ll send you an email. Thank you!
  
  3 replies →
throwup238 3 months ago
Sadly I'm just an amateur armchair historian (at best) so I doubt I'd be of much help. I'm mostly only doing the translation for my own edification
- cco 3 months ago
  
  You may be surprised (or not?) at how many important scientific and historical works are done by armchair practitioners.
- throwout4110 3 months ago
  
  No problem at all, if you have some databases or catalogs I’d be interested in learning more
rmonvfer 3 months ago
Spaniard here. Let me know if I can somehow help navigate all of that. I’m very interested in history and everything related to the 1400-1500 period (although I’m not an expert by any definition) and I’d love to see what modern technology could do here, specially OCRs and VLMs.
- throwout4110 3 months ago
  
  Awesome thank you!
vintermann 3 months ago

You should maybe reach out to the author of this blog post, professor Mark Humphries. Or to the genealogy communities, we struggle with handwritten historical texts no public AI model can make a dent in, regularly.