Comment by fny

5 months ago

A little secret: Apple’s Vision Framework has an absurdly fast text recognition library with accuracy that beats Tesseract. It consumes almost any image format you can think of including PDFs.

I wrote a simple CLI tool and more featured Python wrapper for it: https://github.com/fny/swiftocr

9 comments

fny

Moto7451 5 months ago

This has been one of my favorite features Apple added. When I’m in a call and someone shares a page I need the link to, rather than interrupt the speaker and ask them to share the link it’s often faster to screengrab the url and let Apple OCR the address and take me to the page/post it in chat.

jjice 5 months ago

After getting an iPhone and exploring some of their API documentation after being really impressed with system provided features, I'm blown away by the stuff that's available. My app experience on iOS vs Android is night and day. The vision features alone have been insane, but their text recognition is just fantastic. Any image and even my god awful handwriting gets picked up without issue.

That said, I do love me a free and open source option for this kind of thing. I can't use it much since I'm not using Apple products for my desktop computing. Good on Apple though - they're providing some serious software value.

JeremyNT 5 months ago

I can't comment on what Apple is doing here, but Google has an equivalent called "lens" which works really well and I use it in the way you suggest here.
ted_dunning 5 months ago

Google photo app does really good OCR as well, actually.

eigenvalue 5 months ago

I basically wrapped this in a simple iOS app that can take a PDF, turn it into images, and applies the native OCR to the images. It works shockingly well:

https://apps.apple.com/us/app/super-pdf-ocr/id6479674248

I probably should have just made it a free app so it would have gotten very popular, but oh well.

syntaxing 5 months ago

How does it work with tables and diagrams? I have scanned pages with mixed media, like some are diagrams, I want to be able to extract the text but tell me where the diagrams are in the image with coordinates.

acheong08 5 months ago

I wonder if it's possible to reverse engineer that, rip it out, and put it on Linux. Would love to have that feature without having to use Apple hardware

maCDzP 5 months ago

This seems to run locally?

criddell 5 months ago

Just about everything beats Tesseract.