Comment by iansinnott

2 years ago

It doesn't take up too much space. Currently about 70mb on my system. It will grow over time of course.

It used a distilled version of the web page, i.e. "reader mode" and indexes that rather than the full HTML. So yes it indexes plain text only and, in theory, ignores headers footers and other non-interesting parts of web pages.

Anecdotally I find it invaluable for finding a web page I know i've seen but can't remember.

It's not open source as of this comment, but the plan is to open source it. Agreed that OSS for something like this is important. It will index all your auth-ed pages too.

It's also not minified or obfuscated, so the source is "available" in that sense.

2 comments

iansinnott

pcthrowaway 2 years ago

> It doesn't take up too much space. Currently about 70mb on my system. It will grow over time of course.

If the publication date (Nov 30) is an indication of how long you've been using it, it's not exactly lightweight :P

> It's not open source as of this comment, but the plan is to open source it

Might be really useful with some configuration around which domains to archive; I may contribute (or fork if you're not accepting contributions) if it is released under an OSS license

> It's also not minified or obfuscated, so the source is "available" in that sense.

Amazing!

iansinnott 2 years ago

Cheers! would be happy to have contribution.
I've been using it for longer than the publish date. A few months at this point. It's also roughly twice the size it needs to be I think, as the FTS index could be optimized. So there's room for improvement.