← Back to context

Comment by gyomu

5 hours ago

Like other commenters point out, automatic OCR on Apple platforms is a godsend, and it's such a great use of our modern AI capabilities that it should be a standard feature in every document viewer on every platform.

Another thing I wish was more common is metadata in screenshots, especially on phones. Eg if I take a screenshot of a picture in Instagram, I wish a URL of the picture was embedded (eg instagram.com/p/ABCD1234/). If I take a screenshot in the browser, include the URL that's being viewed (+ path to the DOM element in the viewport). If I take a screenshot in a maps app, include the bounding coordinates. If I take a screenshot in a PDF viewer, include a SHA1 hash of the document being viewed + offset in the document so that if I send the screenshot to someone else with the same document, it can seamlessly link to it. Etc etc.

There are probably privacy concerns to solve here, but no idea is new in computer science and I'm pretty sure some grad student somewhere has already explored the topic in depth (it just never made it to mainstream computing platforms).

It feels like screenshots have become the de facto common denominator in our mobile computing era, since platforms have abstracted files away from us. Lots of people who have only ever used phones as their main computing devices are confused when it comes to files, but everyone seems to understand screenshots.

Also, necessary shout out to Screenshot Conf! https://screenshot.arquipelago.org

OCR is a godsend, 100% agree. Not a fan of the metadata idea personally, 'screenshotting' is done by the operating system, and exposing ways to allow apps to know that they were 'in' the screenshot plus expose some metadata of their choosing (like your examples of GPS coordinates for a maps app, url for browser) sounds like a privacy nightmare, and like something that will make a very reliable core feature much harder to use.

There are companies like Evernote/Zight/CloudApp that at one point tried some things like this, but they never really caught - I think because it's pretty easy to add annotations yourself or some note of your own - and a screenshot not "trying to do everything" is part of what makes them useful & ubiquitous.

OP here. You raised a point that I should have mentioned in the article: screenshots of web pages that don't include the URL. I'm perfectly fine with screenshots of browser windows, since the context is almost always relevant. The system I work on right now puts a lot of useful context into the URL, but it's almost never included in the initial screenshot, so I have to ask for that. Of course, I generally ask for it as text so that I don't have to try to type the whole thing without making a mistake.

  • I was content to write the original off as "to each his own", but this one I feel you on.

    Maybe the problem is sharing without caring and/or without being aware.

    Case in point, folks capture large blocks of text as you mentioned and paste it into slack which converts certain characters unless included in a code block. This can be much worse than sharing a screenshot.

    Please know the best way to share what you are sharing when you share. I've had to come to expect this request will not be honored.

    I also might be guilty of not honoring sharing with caring myself. For example, I didn't read this entire thread before posting; others may have made this exact point already.

> It feels like screenshots have become the de facto common denominator in our mobile computing era,

Google/Apple have taken notice. Both have recently redone their full-screen post-screenshot UI to include AI insights / automatic product searches / direct chat with Gemini/LLM / etc.

Its true everyone uses screenshots to save things they are interested in or want to look up / search more of / save for reason and this UI is the perfect place to insert themselves.

> Eg if I take a screenshot of a picture in Instagram, I wish a URL of the picture was embedded

bloody hell of all privacy concerns

Fun side-fact: The original MacPaint, while in development, had an "ocr" copy feature, albeit much simpler of course.

It didn't make it in the release version out of fear that people would use MacPaint as a Word Processor.

OCR is not AI