← Back to context

Comment by rgovostes

9 months ago

The API: https://developer.apple.com/documentation/vision/recognizing...

In my experience it works remarkably well for features like scanning documents in Notes and in copying or translating text embedded in images in Safari.

It is not open source, but free to use locally. Someone has written a Python wrapper (apple-ocr) around it if you want to use it in other workflows. The model files might be in /System/Library/PrivateFrameworks/TextRecognition.framework if you wanted to port them to other platforms.

I also wrote a Swift CLI that wraps over the Vision framework: https://github.com/nexuist/seev

Text extraction is included (including the ability to specify custom words not found in the dictionary) but there are also utilities for face detection, classification, etc.