Comment by joecool1029
10 hours ago
Which they don’t respect. I’ve had it for my blog for years and they still added it to wayback machine, see my last comment for their official announcement of the ignore robots.txt policy, it is not new.
10 hours ago
Which they don’t respect. I’ve had it for my blog for years and they still added it to wayback machine, see my last comment for their official announcement of the ignore robots.txt policy, it is not new.
robots.txt means they shouldn't auto-scan your site. Any user though can go to the wayback machine and type in a URL and the wayback machine will read that URL. That was the intent of robots.txt (don't scan) not (don't read period). It's spelled out in the spec for robots.txt
The <meta name="robots"> tag and robots.txt serve different roles: robots.txt controls crawling, while the robots meta tag influences indexing and other behavior. https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/...
I wonder how archive.org_bot behaves when <meta name="robots" content="noindex, noarchive, nocache" /> is present.
> I’ve had it for my blog for years
Just out of curiosity, why don't you want your public blog archived? not questioning, just trying to understand the logic/motivations?
Also, I think you're being unfairly downvoted.