Comment by greglindahl
7 years ago
Was it blekko? We had a website owner email us about that issue when blekko's ScoutJet crawler was new... although I don't recall the bit about ignored redirect headers.
7 years ago
Was it blekko? We had a website owner email us about that issue when blekko's ScoutJet crawler was new... although I don't recall the bit about ignored redirect headers.
I'm pretty sure everyone with a crawler has hit this sort of problem before. The first startup I was at did with someone's wiki that had "delete" links everywhere with no auth.
Now that I've hit it once, I watch out for websites with this problem. I was surprised to notice that a Fortune50 tech company's internal employee-personal-webpages-maker-thingie had that issue. And then a week later they asked me if I could crawl their internal web. Uh, no, who knows what other internal systems had that problem?