← Back to context

Comment by wraptile

2 days ago

It's still pretty hard to bypass it with open source solutions. To bypass CF you need:

- an automated browser that doesn't leak the fact it's being automated

- ability to fake the browser fingerprint (e.g. Linux is heavily penalized)

- residential or mobile proxies (for small scale your home IP is probably good enough)

- deployment environment that isn't leaked to the browser.

- realistic scrape pattern and header configuration (header order, referer, prewalk some pages with cookies etc.)

This is really hard to do at scale but for small personal scripts you can have reasonable results with flavor of the month playwright forks on github like nodriver or dedicated tools like Flaresolver but I'd just find a web scraping api with low entry price and just drop 15$ month and avoid this chase because it can be really time consuming.

If you're really on budget - most of them offer 1,000 credits for free which will get you avg 100 pages a month per service and you can get 10 of them as they all mostly function the same.