Comment by tcdent

10 hours ago

This is essentially a solved problem. Whenever someone sends me a screenshot that contains any text information (tables, etc), I pass it to an LLM and it correctly interprets the content of it. On modern versions of macOS you can just select text in images relatively painlessly, too.

Linux desktop users will get there one day.

Or just ask people not to send you data in useless formats. That way you don't have to burn an acre of trees to power it and you help someone be less difficult.

As described in the article, it isn't just text being image but that, usually, the image is only a subset of the entire text. Yes, OCR can help find the file containing a code segment in your local codebase but issues such as, mentioned in the article, sending a random error line rather the entire log remain.

Claude on Linux does it fine, so does cursor, codex, claude code, ollama etc. Not that I would use any of these for this; if someone sends me screenshot, it is relevant for me so I know where to find what is in it quite readily if needed at all.