Comment by tcdent
10 hours ago
This is essentially a solved problem. Whenever someone sends me a screenshot that contains any text information (tables, etc), I pass it to an LLM and it correctly interprets the content of it. On modern versions of macOS you can just select text in images relatively painlessly, too.
Linux desktop users will get there one day.
Or just ask people not to send you data in useless formats. That way you don't have to burn an acre of trees to power it and you help someone be less difficult.
I'm sure they will send you well written, accurate documentation if you ask, too...
I'm absolutely sure they won't if you don't.
As described in the article, it isn't just text being image but that, usually, the image is only a subset of the entire text. Yes, OCR can help find the file containing a code segment in your local codebase but issues such as, mentioned in the article, sending a random error line rather the entire log remain.
Claude on Linux does it fine, so does cursor, codex, claude code, ollama etc. Not that I would use any of these for this; if someone sends me screenshot, it is relevant for me so I know where to find what is in it quite readily if needed at all.
Another way it's solved is that clipboards work on text too.