Comment by faebi
14 days ago
Shouldn't it be easy to generate a lot of OCR data? Generate HTML, randomize, generate image, apply noise and let it train on it.
14 days ago
Shouldn't it be easy to generate a lot of OCR data? Generate HTML, randomize, generate image, apply noise and let it train on it.
Yes, but if you aren't careful you will end up with a model carefully tuned for be ways that you add noise not all types of noise from the real world. But stuff like this can be very useful for some base training especially if you add many real-world examples afterwards.