Comment by ArcHound

1 month ago

AFAIK at least the comet browser uses OCR, so I worry that the "OCR not feasible" argument is sadly wrong.

The comet browser is different from scraping, though, no? Not that I’d ever use this, but the goal doesn’t seem to be “no AI can ever touch this” but rather “large scale training-data scrapers find useless garbage.”

  • I'd say it's a good PoC.

    They want to have many users. So they are ok with using OCR for many users. And since they are sending the accessed content through their APIs, might as well send a copy of it to training.

    In conclusion, it seems that mass OCR usage is within the scope of the AI companies.