Comment by ghm2199
19 hours ago
You can read https://hackernoon.com/the-long-now-of-the-web-inside-the-in... it was a nice look into their infra structure. One could theoretically build it. A few things stand out:
1. IIUC depends a lot on "Save Page Now" democratization, which could work, but its not like a crawler.
2. In absence of alexa they depend quite heavily on common crawl, which is quite crazy because there literally is no other place to go. I don't think they can use google's syndicated API, cause they would then start showing ads in their database, which is garbage that would strain their tiny storage budget.
3. Minor from a software engineering perspective but important for survival of the company: since they are an artifact of record storage, to convert that to an index would need a good legal team to battle google to argue. They do that the DoJ's recent ruling in their favor.
No comments yet
Contribute on Hacker News ↗