← Back to context

Comment by vharish

6 months ago

I'm guessing that would ideally mean only reading the content the user would otherwise have gone through. I wonder if that's the case and if it's guaranteed.

Maybe some new standards and maybe a user configurable per site permissions may make it better?

I'm curious to see how this will turn out to be.

> only reading the content the user would otherwise have gone through.

Why? My user agent is configured to make things easier for me and allow me to access content that I wouldn't otherwise choose to access. Dark mode allows me to read late at night. Reader mode allows me to read content that would otherwise be unbearably cluttered. I can zoom in on small text to better see it.

Should my reader mode or dark mode or zoom feature have to respect robots.txt because otherwise they'd allow me to access content that I would otherwise have chosen to leave alone?

  • Yeah no, nothing of that helps you bypass the ads on their website*, but scraping and summarizing does, so its wildly different for monetization purposes, and in most cases that means the maintainability and survival of any given website.

    I know its not completely true, I know reader mode can help you bypass the ads _after_ you already had a peek at the cluttered version, but if you need to go to the next page or something like that you need to disable reader-mode once and so on, so its a very granular ad-blocking while many AI use cases are about bypassing viewing it at all by a human; and the other thing is that reader mode is not very popular so its not a significant threat.

    *or other links on their websites, or informative banners, etc

    • > I know its not completely true, I know read-mode can help you bypass the ads _after_ you already had a peek at the cluttered version

      What about reader mode that is auto-configured to turn on immediately on landing on specific domains? Is that a robot for the purposes of robots.txt?

      https://addons.mozilla.org/en-US/firefox/addon/automatic-rea...

      And also, just to confirm, I'm to understand that if I'm navigating the internet with an ad blocker then you believe that I should respect robots.txt because my user agent is now a robot by virtue of using an ad blocker?

      Is that also true if I browse with a terminal-based browser that simply doesn't render JavaScript or images?

      2 replies →

    • robots.txt is not there to protect your ad-based business model. It's meant for automated scrapers that recursively retrieve all pages on your website, which this browser is not doing at all. What a user does with a page after it has entered their browser is their own prerogative.

      4 replies →