Comment by DCoder

7 years ago

Many years ago, I was asked to look at why all the content had vanished from a site (not built by me). After digging in a bit, I found that:

1) the original developer's idea of handling an unauthorized /admin request was just to set a redirect header and continue processing the current request .

2) the /admin page had a grid of all the content on the site, with handy 'Delete' links that ran over GET without confirmation.

You can probably guess where this is going – some search bot hit the overview page, ignored the redirect header, saw the content, and dutifully crawled every single link on it…

4 comments

DCoder

acdha 7 years ago

There were at least two browser extensions which also discovered that poor design was widespread and to disable prefetching for similar reasons:

http://fasterfox.mozdev.org/index.html

https://signalvnoise.com/archives2/google_web_accelerator_he...

I think the state of the web has improved slightly over the last decade but this is a great example of why browser vendors are so conservative. You can do this now but only opt-in.

greglindahl 7 years ago

Was it blekko? We had a website owner email us about that issue when blekko's ScoutJet crawler was new... although I don't recall the bit about ignored redirect headers.

saalweachter 7 years ago
I'm pretty sure everyone with a crawler has hit this sort of problem before. The first startup I was at did with someone's wiki that had "delete" links everywhere with no auth.
- greglindahl 7 years ago
  
  Now that I've hit it once, I watch out for websites with this problem. I was surprised to notice that a Fortune50 tech company's internal employee-personal-webpages-maker-thingie had that issue. And then a week later they asked me if I could crawl their internal web. Uh, no, who knows what other internal systems had that problem?