Comment by tcdent

10 hours ago

This is essentially a solved problem. Whenever someone sends me a screenshot that contains any text information (tables, etc), I pass it to an LLM and it correctly interprets the content of it. On modern versions of macOS you can just select text in images relatively painlessly, too.

Linux desktop users will get there one day.

6 comments

tcdent

walt_grata 10 hours ago

Or just ask people not to send you data in useless formats. That way you don't have to burn an acre of trees to power it and you help someone be less difficult.

tcdent 9 hours ago
I'm sure they will send you well written, accurate documentation if you ask, too...
- matt_kantor 9 hours ago
  
  I'm absolutely sure they won't if you don't.

forgotpwd16 9 hours ago

As described in the article, it isn't just text being image but that, usually, the image is only a subset of the entire text. Yes, OCR can help find the file containing a code segment in your local codebase but issues such as, mentioned in the article, sending a random error line rather the entire log remain.

anonzzzies 9 hours ago

Claude on Linux does it fine, so does cursor, codex, claude code, ollama etc. Not that I would use any of these for this; if someone sends me screenshot, it is relevant for me so I know where to find what is in it quite readily if needed at all.

recursive 9 hours ago

Another way it's solved is that clipboards work on text too.