Comment by yjftsjthsd-h

7 months ago

Eh, what AI taketh, AI can give; modern OCR has gotten mostly decent. If you're on Windows you should try the powertools OCR tool.

> If you're on Windows you should try the powertools OCR tool.

Which is open source (MIT-licensed), the source code is here: https://github.com/microsoft/PowerToys/tree/main/src/modules...

It is written in C#, and uses the Windows.Media.Ocr UWP API to do the actual OCR part: https://learn.microsoft.com/en-us/uwp/api/windows.media.ocr?... – so if your app runs on Windows it can potentially call the same API and get OCR for free

Apple provides OCR through VisionKit ImageAnalyzer API – https://developer.apple.com/documentation/visionkit/imageana... – albeit that is only officially supported to call from Swift (although apparently you can expose it to Objective C if your write a "proxy Swift framework"–a custom Swift framework that wraps the original and adds @objc everywhere–I assume such a proxy framework could be autogenerated using reflection, but I'm not sure if anyone has written a tool that actually does that). There is also the older VNRecognizeTextRequest API which is supported by Objective C, but its OCR quality is inferior.

I'm not sure what the best answer for Linux or Android is. I guess https://github.com/tesseract-ocr/tesseract ?

A very similar thing is also just built in to the screenshot tool, at least in Windows 11, easier for me to use since it's the same keybind as always to take a screenshot, then it's just a tool in it.