← Back to context

Comment by magnat

7 months ago

> if you copy and paste your "improved" text then it will no longer say what you thought it did

It's not a bug, it's a feature - a DRM. Your content can now be consumed, but cannot be copied or modified - all without external tools, as long as you embed that TTF somehow.

Which kind of reminds me of a PDF invoices I got from my electricity provider. It looked and printed perfectly fine, but used weird codepoint mapping which resulted in complete garbage when trying to copy any text from it. Fun times, especially when pasting account number to a banking app.

This is while pretty much all software that extracts structured data from PDFs throws away the text and just OCRs the page. Too many tricks with layouts and fonts.

  • I'm always surprised how "generate PDF from Word" turns one word into 10 different print points, all with just a single letter.

    Or even straight lines in a table. The straight lines from a table boundary get hacked into pieces. You'd think one line would be the ideal presentation for a line, but who are you to judge PDF?

Eh, what AI taketh, AI can give; modern OCR has gotten mostly decent. If you're on Windows you should try the powertools OCR tool.

  • > If you're on Windows you should try the powertools OCR tool.

    Which is open source (MIT-licensed), the source code is here: https://github.com/microsoft/PowerToys/tree/main/src/modules...

    It is written in C#, and uses the Windows.Media.Ocr UWP API to do the actual OCR part: https://learn.microsoft.com/en-us/uwp/api/windows.media.ocr?... – so if your app runs on Windows it can potentially call the same API and get OCR for free

    Apple provides OCR through VisionKit ImageAnalyzer API – https://developer.apple.com/documentation/visionkit/imageana... – albeit that is only officially supported to call from Swift (although apparently you can expose it to Objective C if your write a "proxy Swift framework"–a custom Swift framework that wraps the original and adds @objc everywhere–I assume such a proxy framework could be autogenerated using reflection, but I'm not sure if anyone has written a tool that actually does that). There is also the older VNRecognizeTextRequest API which is supported by Objective C, but its OCR quality is inferior.

    I'm not sure what the best answer for Linux or Android is. I guess https://github.com/tesseract-ocr/tesseract ?

  • A very similar thing is also just built in to the screenshot tool, at least in Windows 11, easier for me to use since it's the same keybind as always to take a screenshot, then it's just a tool in it.