← Back to context

Comment by AznHisoka

16 hours ago

Excuse my technical ignorance, but is it actually trying to get all the files in your git repo? Couldn’t you just have everything behind an user/pass if so?

Author of the article here. The behavior of the bot seems like this:

  while true {
    const page = await load_html_page(read_from_queue());
    save_somewhere(page);
    foreach link in page {
      enqueue(link);
    }
  }

This means that every link on every page gets enqueued and saved to do something. Naturally, this means that every file of every commit gets enqueued and scraped.

Having everything behind auth defeats the point of making the repos public.

  • >Having everything behind auth defeats the point of making the repos public.

    Maybe add a captcha? Can be something simple and ad hoc, but unique enough to throw off most bots.