← Back to context

Comment by jay-barronville

1 year ago

> I've now rebuilt the tool from the ground up, switching from a Puppeteer-based crawler to an invisible iframe approach.

Where can I go to learn more about your invisible `<iframe>` approach/implementation?

I figured it out mostly from first principles. It's such a niche crawling method that was perfectly limited to my use-case, and there's alot to say. But the main idea is that you can inject a crawling script in the html of the site via a proxy you control. E.g proxy.yoursite.com?url=<SITE_YOU_WANT_TO_CRAWL>. Then once you've got the data you can call window.postMessage(data) to communicate with the main window.

It's somewhat similar to how browser proxies like: https://proxyium.com/ and https://www.proxysite.com/ fetch the html on your behalf.