Comment by xena
21 hours ago
Author of the article here. The behavior of the bot seems like this:
while true {
const page = await load_html_page(read_from_queue());
save_somewhere(page);
foreach link in page {
enqueue(link);
}
}
This means that every link on every page gets enqueued and saved to do something. Naturally, this means that every file of every commit gets enqueued and scraped.
Having everything behind auth defeats the point of making the repos public.
>Having everything behind auth defeats the point of making the repos public.
Maybe add a captcha? Can be something simple and ad hoc, but unique enough to throw off most bots.
That's what I'm working on right now.