← Back to context

Comment by DCoder

7 years ago

Many years ago, I was asked to look at why all the content had vanished from a site (not built by me). After digging in a bit, I found that:

1) the original developer's idea of handling an unauthorized /admin request was just to set a redirect header and continue processing the current request .

2) the /admin page had a grid of all the content on the site, with handy 'Delete' links that ran over GET without confirmation.

You can probably guess where this is going – some search bot hit the overview page, ignored the redirect header, saw the content, and dutifully crawled every single link on it…

Was it blekko? We had a website owner email us about that issue when blekko's ScoutJet crawler was new... although I don't recall the bit about ignored redirect headers.

  • I'm pretty sure everyone with a crawler has hit this sort of problem before. The first startup I was at did with someone's wiki that had "delete" links everywhere with no auth.

    • Now that I've hit it once, I watch out for websites with this problem. I was surprised to notice that a Fortune50 tech company's internal employee-personal-webpages-maker-thingie had that issue. And then a week later they asked me if I could crawl their internal web. Uh, no, who knows what other internal systems had that problem?