The author used to run a public Arch mirror under mirror.ext4.xyz, so it's not exactly an unknown domain.
Combined with the fact that a lot of their self-hosted stuff, including the Reddit front-ends, are in the Certificate Transparency logs [1], it's not hugely surprising that web crawlers would run into them.
I am definitely not surprised. It's quite normal, as I stated. What I was trying to say was they are abusing the private frontends to get around legal restrictions of aggressively scraping the web
The author used to run a public Arch mirror under mirror.ext4.xyz, so it's not exactly an unknown domain.
Combined with the fact that a lot of their self-hosted stuff, including the Reddit front-ends, are in the Certificate Transparency logs [1], it's not hugely surprising that web crawlers would run into them.
[1]: https://crt.sh/?q=ext4.xyz
I am definitely not surprised. It's quite normal, as I stated. What I was trying to say was they are abusing the private frontends to get around legal restrictions of aggressively scraping the web
that's not scraping, that's web search.
their scrapers wouldn't identify themselves