Comment by haolez

4 days ago

I've seen a team implement Go workers that would download the HTML from a target, then download some of the referenced JavaScript files, then run these JavaScript files in an embedded JavaScript engine so that they could consume less resources to get the specific things that they needed without using a full browser. It's like a browser homunculus! Of course, each new site would require custom code. This was for quant stuff. Quite cool!

2 comments

haolez

odo1242 4 days ago

This exact homunculus is actually supported in Node.JS by the `jsdom` library: https://www.npmjs.com/package/jsdom

I don't know how well it would work for that use-case, but I've used it before, for example, to write a web-crawler that could handle client-side rendering.

nikisweeting 4 days ago

our use primary use-case with the AI stuff is not really scraping, we're mostly going after RPA