Comment by joecool1029

10 hours ago

Which they don’t respect. I’ve had it for my blog for years and they still added it to wayback machine, see my last comment for their official announcement of the ignore robots.txt policy, it is not new.

3 comments

joecool1029

socalgal2 9 hours ago

robots.txt means they shouldn't auto-scan your site. Any user though can go to the wayback machine and type in a URL and the wayback machine will read that URL. That was the intent of robots.txt (don't scan) not (don't read period). It's spelled out in the spec for robots.txt

keane 8 hours ago

The <meta name="robots"> tag and robots.txt serve different roles: robots.txt controls crawling, while the robots meta tag influences indexing and other behavior. https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...
I wonder how archive.org_bot behaves when <meta name="robots" content="noindex, noarchive, nocache" /> is present.

ninjagoo 2 hours ago

> I’ve had it for my blog for years

Just out of curiosity, why don't you want your public blog archived? not questioning, just trying to understand the logic/motivations?

Also, I think you're being unfairly downvoted.