Comment by cphoover
13 hours ago
As I understand it as models driving agent behavior of headless browsers are getting more and more sophisticated it's getting harder to reliably predict.
The same way LLM's without watermarking cannot be reliably classified as "not-human" neural-network driven scraping tools are getting harder to detect.
Cloudflare, and DataDome position themselves as companies that can detect automated traffic using things like IP reputation, behavioral signals, timing... But these things can be faked through proxy-networks, human behavior signals can be imitated with generative AI the same way text can be, web bots can utilize neural networks to generate trajectories and timings similar to those of humans.
If you can have an AI use a browser the same way a human can how can you distinguish the two?
No comments yet
Contribute on Hacker News ↗