Comment by DanielHB

2 days ago

How do you bypass cloudflare? I do some light scrapping for some personal stuff, but I can't figure out how to bypass it. Like do you randomize IPs using several VPNs at the same time?

I usually just sit there on my phone pressing the "I am not a robot box" when it triggers.

It's still pretty hard to bypass it with open source solutions. To bypass CF you need:

- an automated browser that doesn't leak the fact it's being automated

- ability to fake the browser fingerprint (e.g. Linux is heavily penalized)

- residential or mobile proxies (for small scale your home IP is probably good enough)

- deployment environment that isn't leaked to the browser.

- realistic scrape pattern and header configuration (header order, referer, prewalk some pages with cookies etc.)

This is really hard to do at scale but for small personal scripts you can have reasonable results with flavor of the month playwright forks on github like nodriver or dedicated tools like Flaresolver but I'd just find a web scraping api with low entry price and just drop 15$ month and avoid this chase because it can be really time consuming.

If you're really on budget - most of them offer 1,000 credits for free which will get you avg 100 pages a month per service and you can get 10 of them as they all mostly function the same.

I use Camoufox for the browser and "playwright-captcha" for the CAPTCHA solving action. It's not fully reliable but it works.

I believe usually you would bypass by using residential ips / proxies?

  • I run it through my home network and I'm still triggering it. I add 2s delays between page load and it still triggers

    • Well, if that's true... I am so sorry to tell you this, it looks like you are in fact a robot.