Comment by VladVladikoff

17 hours ago

This is a fundamental misunderstanding of what those bots are requesting. They aren’t parsing those PHP files, they are using their existence for fingerprinting — they are trying to determine the existence of known vulnerabilities. They probably immediately stop reading after receiving a http response code and discard the remainder of the request packets.

5 comments

VladVladikoff

amypetrik8 4 hours ago

> They aren’t parsing those PHP files, they are using their existence for fingerprinting — they are trying to determine the existence of known vulnerabilities.

So would the natural strategy then be to flag some vulnerability of interest? Either one typically requiring more manual effort (waste their time), or one that is easily automated so as to trap a bot in a honeybot i.e. "you got in, what do next? oh upload all your kit and show how you work? sure" see: the cuckoos egg

holysoles 14 hours ago

You're right, something like fail2ban or crowdsec would probably be more effective here. Crowdsec has made it apparent to me how much vulnerability probing is done, its a bit shocking for a low-traffic host.

ajsnigrutin 14 hours ago
And you'd ban the ip, their one day lease on the VM+IP would expire, someone else will get the same IP on a new VM and be blocked from everywhere.
Would be usable to ban the ip for a few hours to have the bot cool down for a bit and move onto a next domain.
- holysoles 13 hours ago
  
  I was referring to the rules/patterns provided by crowdsec rather than the distribution of known "bad" IPs through their Central API.
  The default ban for traffic detected by your crowdsec instance is 4 hours, so that concern isn't very relevant in that case.
  The decisions from the Central API from other users can be quite a bit longer (I see some at ~6 days), but you also don't have to use those if you're worried about that scenario.

mattgreenrocks 14 hours ago

It would be such a terrible thing if some LLM scrapers were using those responses to learn more about PHP, especially because of that recent paper pointing out it doesn't take that many data points to poison LLMs.