Comment by iknownothow

1 year ago

I've been inadvertently working on this topic and I'd like to share some findings.

* Do not confuse bots with DDoS. While bot traffic may end up overwhelming your server, your DDoS SaaS will not stop that traffic unless you have some kind of bot protection enabled, for example the product described in post.

* A lot of bots announce themselves via user agents, some don't.

* If you're running an ecom shop with a lot of product pages, expect a large portion of traffic to be bots and scrapers. In our case it was upto 50%, which was surprising.

* Some bots accept cookies and these skew your product analytics.

* We enabled automatic bot protection and a of lot our third party integrations ended up being marked as bots and their traffic was blocked. We eventually turned that off.

* (EDIT) Any sophisticated self implemented bot protection isn't worth the effort for most companies out there. But I have to admit, it's very exciting to think about all the ways to block bots.

What's our current status? We've enabled monitoring to keep a look out for DDoS attempts but we're taking the hit on bot traffic. The data on our the website isn't really private info, except maybe pricing, and we're really unsure how to think about the new AI bots scraping this information. ChatGPT already gives a summary of what our company does. We don't know if that's a good thing or not. Would be happy to hear anyone's thoughts on how to think about this topic.

> If you're running an ecom shop with a lot of product pages, expect a large portion of traffic to be bots and scrapers.

It's crazy; I registered a new website last month, and every day I get around ~200 visitors, for a landing page only! This site is not mentioned or advertised anywhere. The only list where you might find it is in the newly registered domains.

  • > The only list where you might find it is in the newly registered domains.

    No registration anywhere needed, they'll find you, because you have an IP address. I've set up enough machines without any registration and some hours after they got connected, the usual suspects showed up.

    And regarding bots: even if machines don't have e.g. PHP installed, they'll see oodles of attempts to access links ending in *.php. That's the place where I liked to offer randomly encrypted linux kernels for them to digest ;-)

    • That’s actually a smart thing to do! I did notice the extension too, I even did notice the typical wordpress paths. I do understand for a known site, but one that was registered hours ago? Unbelievable.

      1 reply →

  • > This site is not mentioned or advertised anywhere. The only list where you might find it is in the newly registered domains.

    Well, that's one place already. Another is in the published list of new HTTPS certificates. As such, "not mentioned" doesn't hold true.

    • > Another is in the published list of new HTTPS certificates

      True, but it’s one of a millions and the amount of them is still crazy

      > As such, "not mentioned" doesn't hold true.

      I meant by me.