← Back to context

Comment by harianus

7 years ago

- No plans to go open source with the backend, but I do show the code that is run in the browser. The visualisation of the data is not super important I think. - I don't save IP's, not even in the logs. - I don't have unique pageviews at the moment. I will in the future. If the referrer is the same as the current page, I will measure that as a non-unique. What do you think?

If you don't go open source, will you at least offer paid self-hosting (similar to what e.g. Atlassian offers).

The idea of privacy is much easier to sell if the data never leaves your own server, instead of using some analytics provider that might be run by the CIA or the Russian mafia for all we can prove.

> What do you think?

Apart from the unfortunate non-open-source answer, this sounds great!

I get others' concerns about wanting unique pageviews, but that metric is always a bit of a sketchy either-or for extremely privacy-conscious people. It's both an incredibly valuable metric, and also one that's difficult to square with complete privacy (basically it's always going to be pseudonymous at best).

Have you considered using a shared-source license where they can inspect and build from source that they have to pay for? And where people can obtain the source freely for academic research and/or security reviews?

Shared-source proprietary goes as far back as Burroughs B5000 mainframe whose customers got the source and could send in fixes/updates. Microsoft has a Shared Source program. Quite a few suppliers in embedded do it. There's also a company that sells UI software which gives the source to customers buying higher-priced version.

I will warn that people might still rip off and use your code. Given it's JavaScript, I think they can do that anyway with reverse engineering. It also sounds like they could build it themselves anyway. Like most software bootstrappers or startups, you're already in a race with other players that might copy you with clean slate implementations. So, I don't know if the risk is that big a deal or not. I figured I should mention it for fairness.

Doesn't seem like a very useful measure of uniqueness.

What if you had one-day retention of IP addresses for per-day unique views? Seems like too important of a metric to eliminate completely, and one-day retention seems like a decent trade-off at the expense of being able to do unique analysis over longer time periods.

  • Don’t retain the IP address, retain a hash of the IP address.

    • Not private enough as the space of IP addresses is too small. Removing the last octet of IPv4 addresses before storing them should provide better privacy.

      2 replies →

    • A plain hash doesn't make a difference.

      One can use hashes with regularly changing salts that are destroyed after a while to make older hashes unusable though for some purposes.

    • When you can trivially crawl the input space like ipv4 addresses, you'd have to expire a fresh per-day salt as well.

      But to my eyes, expiring salts isn't much different than deleting ip addresses after one day. Just more machinery. People have to trust that you're doing either, so why bother beyond being able to use the word "hashing" in marketing language?

      2 replies →

> No plans to go open source with the backend

You say that you do not store IP addresses, but why should anybody believe it?

Modern security is based on proof, not on trust.

  • > You say that you do not store IP addresses, but why should anybody believe it?

    I can show the code, I will probably do this in my next blog post, but that does not guaranty anything.

    > Modern security is based on proof, not on trust.

    Is it? So if there is a hosted version of a open source tool, you are sure they use the same code on the hosted version a in the open source tool? It's still based on trust.