Comment by gyomu
5 hours ago
Like other commenters point out, automatic OCR on Apple platforms is a godsend, and it's such a great use of our modern AI capabilities that it should be a standard feature in every document viewer on every platform.
Another thing I wish was more common is metadata in screenshots, especially on phones. Eg if I take a screenshot of a picture in Instagram, I wish a URL of the picture was embedded (eg instagram.com/p/ABCD1234/). If I take a screenshot in the browser, include the URL that's being viewed (+ path to the DOM element in the viewport). If I take a screenshot in a maps app, include the bounding coordinates. If I take a screenshot in a PDF viewer, include a SHA1 hash of the document being viewed + offset in the document so that if I send the screenshot to someone else with the same document, it can seamlessly link to it. Etc etc.
There are probably privacy concerns to solve here, but no idea is new in computer science and I'm pretty sure some grad student somewhere has already explored the topic in depth (it just never made it to mainstream computing platforms).
It feels like screenshots have become the de facto common denominator in our mobile computing era, since platforms have abstracted files away from us. Lots of people who have only ever used phones as their main computing devices are confused when it comes to files, but everyone seems to understand screenshots.
Also, necessary shout out to Screenshot Conf! https://screenshot.arquipelago.org
OCR is a godsend, 100% agree. Not a fan of the metadata idea personally, 'screenshotting' is done by the operating system, and exposing ways to allow apps to know that they were 'in' the screenshot plus expose some metadata of their choosing (like your examples of GPS coordinates for a maps app, url for browser) sounds like a privacy nightmare, and like something that will make a very reliable core feature much harder to use.
There are companies like Evernote/Zight/CloudApp that at one point tried some things like this, but they never really caught - I think because it's pretty easy to add annotations yourself or some note of your own - and a screenshot not "trying to do everything" is part of what makes them useful & ubiquitous.
But apps (most notably Snapchat comes to mind) have been doing exactly that analysis though. Theoretically they could then [offer to] edit the photo immediately afterwards to add context, since they had access to the photo roll or files https://android.stackexchange.com/a/119767
OP here. You raised a point that I should have mentioned in the article: screenshots of web pages that don't include the URL. I'm perfectly fine with screenshots of browser windows, since the context is almost always relevant. The system I work on right now puts a lot of useful context into the URL, but it's almost never included in the initial screenshot, so I have to ask for that. Of course, I generally ask for it as text so that I don't have to try to type the whole thing without making a mistake.
I was content to write the original off as "to each his own", but this one I feel you on.
Maybe the problem is sharing without caring and/or without being aware.
Case in point, folks capture large blocks of text as you mentioned and paste it into slack which converts certain characters unless included in a code block. This can be much worse than sharing a screenshot.
Please know the best way to share what you are sharing when you share. I've had to come to expect this request will not be honored.
I also might be guilty of not honoring sharing with caring myself. For example, I didn't read this entire thread before posting; others may have made this exact point already.
> It feels like screenshots have become the de facto common denominator in our mobile computing era,
Google/Apple have taken notice. Both have recently redone their full-screen post-screenshot UI to include AI insights / automatic product searches / direct chat with Gemini/LLM / etc.
Its true everyone uses screenshots to save things they are interested in or want to look up / search more of / save for reason and this UI is the perfect place to insert themselves.
> Eg if I take a screenshot of a picture in Instagram, I wish a URL of the picture was embedded
bloody hell of all privacy concerns
Fun side-fact: The original MacPaint, while in development, had an "ocr" copy feature, albeit much simpler of course.
It didn't make it in the release version out of fear that people would use MacPaint as a Word Processor.
OCR is not AI
Source: https://en.wikipedia.org/wiki/AI_effect
But AI can OCR
They do so by running the image through an OCR tool call
That's a thing I always marvel about - how LLMs are so versatile and do so much stuff so good that was out of reach just some years ago
Yes but they're quite good at it. Reliable OCR is font dependent, whereas I think a lot of models just kind of figure it out regardless.
One reason I don't quite trust AI for OCR is that it will, on occasion, hallucinate the output.
2 replies →
God of the gaps