Comment by renegat0x0

2 months ago

There are many nice http clients:

- httpx

- curl cffi

- httpmorph

- httpcloak

- stealth crawler

I wrote a framework, link below, which uses them all. You can compare each to verify crawling speed. Some sites can be cleanly crawled with a one particular framework.

Having read the article I am in a pain. I do break things while development. I rewrite stuff. Maybe some day I will find a way to develop things "stable". One thing I try to keep in good shape is 'docker' image. I update it once everything seems to be quite stable.

https://github.com/rumca-js/crawler-buddy