← Back to context

Comment by dang

10 years ago

I'm open to working on it again. Qua user, I would love to be able to view Readability-style versions of stories quickly. And think of all the analytics people could do on a near-complete archive of all HN stories.

But it's a matter of priorities. Had it sped up moderation it would have both paid for itself and made certain campers happier. But it didn't turn out that way. Beyond that, technically it's a nontrivial problem to get working on the full range of content, and then there are the nontechnical obstacles. We wouldn't do it without being sure we could release it.

Sending requests to Internet Archive might be an option if they'd be ok with it, but that of course would only help with caching, not decrufting.

Totally understand on all points.

Caches on their own would be totally worthwhile, even without de-crufting. If IA are up for it, HN as signal for relevance would likely be worthwhile. Talk to Brewster.

As I said, decrufting/readability would be a really nice value-add!personally. Readability themselves have an API for this which might be one way to approach the concept, and they've done much of the heavy lifting in terms of sorting out sites' various CSS/HTML cruft and sanitizing it. I do my own pretty significant CSS restructuring locally (we've chatted about this before w/ HN), and with some 1800+ individual sites' CSS modified to some extent or anohter, I've got a really good idea of just how effed up the stuff can be.

I totally agree with Nic Bvacqua's "Stop Breaking the Web" posted yesterday.

But on an effort/reward basis as a greenfield project, likely not worth it. Going with Readability (or Instapaper, or Pocket) themselves could well be worth investigating.

As a suggestion: another consideration would be to simply reject submissions which aren't accessible via some putative minimal client. If enough aggregators started penalising sites for inaccessible content, they might start wising up.