Comment by mcswell
3 days ago
Obviously this was whimsical when it came out. However...we were creating synthetic data for training and testing OCR in multiple scripts. We would take a web page in some language with a non-Roman script, and reproduce it as multiple PDFs using different fonts. We also added various kinds of blurring, using ImageMagick and---of course---this very coffee stains program!
No comments yet
Contribute on Hacker News ↗