Comment by VladVladikoff
17 hours ago
This is a fundamental misunderstanding of what those bots are requesting. They aren’t parsing those PHP files, they are using their existence for fingerprinting — they are trying to determine the existence of known vulnerabilities. They probably immediately stop reading after receiving a http response code and discard the remainder of the request packets.
> They aren’t parsing those PHP files, they are using their existence for fingerprinting — they are trying to determine the existence of known vulnerabilities.
So would the natural strategy then be to flag some vulnerability of interest? Either one typically requiring more manual effort (waste their time), or one that is easily automated so as to trap a bot in a honeybot i.e. "you got in, what do next? oh upload all your kit and show how you work? sure" see: the cuckoos egg
You're right, something like fail2ban or crowdsec would probably be more effective here. Crowdsec has made it apparent to me how much vulnerability probing is done, its a bit shocking for a low-traffic host.
And you'd ban the ip, their one day lease on the VM+IP would expire, someone else will get the same IP on a new VM and be blocked from everywhere.
Would be usable to ban the ip for a few hours to have the bot cool down for a bit and move onto a next domain.
I was referring to the rules/patterns provided by crowdsec rather than the distribution of known "bad" IPs through their Central API.
The default ban for traffic detected by your crowdsec instance is 4 hours, so that concern isn't very relevant in that case.
The decisions from the Central API from other users can be quite a bit longer (I see some at ~6 days), but you also don't have to use those if you're worried about that scenario.
It would be such a terrible thing if some LLM scrapers were using those responses to learn more about PHP, especially because of that recent paper pointing out it doesn't take that many data points to poison LLMs.