← Back to context

Comment by pdkl95

7 years ago

> unnecessarily track users without their consent

Regardless of your intentions, you are collecting enough data to track users.

> I am transparent about what I collect ([URL])

That page doesn't mention that you are also collecting (and make no claim about storing) the globally-visible IP address (and any other data in the IP and TCP headers). This can be uniquely identifying; even when it isn't unique you usually only need a few bits of additional entropy to reconstruct[1] a unique tracking ID.

In my case, you're collecting and storing more than enough additional entropy to make a decent fingerprint because [window.innerWidth, window.innerHeight] == [847, 836]. Even if I resized the window, you could follow those changes simply by watching analytics events from the same IP that are temporally nearby (you are collecting and storing timestamps).

[1] An older comment where I discussed how this could be done (and why GA's supposed "anonymization" feature (aip=1) is a blatant lie): https://news.ycombinator.com/item?id=17170468

I think there's value in at least distributing the data that's collected. I may not like that the analytics provider has my data, but it seems like a lesser evil if that provider isn't also the world's largest ad company and they aren't using it to build profiles behind the scenes to track my every move across a significant part of the Internet.

Given the choice between a lot of data about me given to a small provider and somewhat less data about me given to Google, I'd generally choose the former.

  • Thats no a good way to make a decision. Big,small doesn't matter. What matters is who is providing better security? When 2 parties big,small are collecting data ,then the party which can act on security vulnerabilities quickly and has great security engineers and dedicated teams like Project Zero- is the much better choice. People nowadays assume that a small,indie developer is a good guy. I am just pointing out that this is a very bad bias to have. Technicalities matter, security robustness matters. Google might be collecting data,but their security is really good. Good effort by this dev though.

    • I totally agree on the security aspect, but I think we're talking about different threat models.

      Security matters if your concern is the data leaking to a potential malicious actor. The concern that I'm speaking to is the intended use of the data. Google is definitely going to use it for ad targeting and building a "shadow profile", but a small developer probably won't. This one says they won't, but even if they do they're likely to be much less effective than Google would be.

      10 replies →

    • > When 2 parties big,small are collecting data ,then the party which can act on security vulnerabilities quickly and has great security engineers and dedicated teams

      This cannot be stressed enough. At my day job I write reasonably secure software on a team for big clients, then at home I write reasonably secure software independently for small clients.

      Come new security issue, the big clients at day job get first priority. Not because they are big and not because they are paying more, but rather because as a team we can reallocate resources and work on issues in parallel. At home, there is only one Dotan to work on each independent client in series.

    • Better than Google "having great security" would be if Google was not collecting that much information in the first place.

  • I think how the data is used is also a big factor.

    There is 'justice' in the blog creator using analytics data to to improve the experience of blog visitors: a user's data will, theoretically and in aggregate, create a better experience for that user in the future. The class of 'users who browse this page' gets a benefit from the cost of providing data.

    Selling browsing information to advertisers is sort of 'anti-justice'. Using blog visitor data to track and more effectively manipulate those visitors elsewhere on the internet into paying people money. The blog visitor's external online experience is made worse by browsing that blog.

Good comment! I only store the window.innerWidth metric. I updated the what we collect page (https://simpleanalytics.io/what-we-collect) to reflect the IP handling. We don't store them. And fingerprinting is something that would be definitely tracking, not on my watch!

  • > We don't collect and store IPs.

    First, "IPs" might be confusing; "IP addresses" would be more accurate.

    More importantly, you have to collect IP addresses (or any other value in the packet headers[1][2]) - even if you don't store it - if you want to receive any packets from the rest of the internet. Storage of those values is separate issue entirely, and it's good to hear that you are intending to NOT store IP addresses (and updating the documenting)!

    Also, I strongly recommend using Drdrdrq's suggestion to lower the precision of the collected window dimensions, which should be done on the client i.e. "Math.floor(window.innerWidth/50)*50". This kind of bit-reduction makes fingerprinting a lot harder.

    [1] https://en.wikipedia.org/wiki/IPv4#Header

    [2] https://en.wikipedia.org/wiki/Transmission_Control_Protocol#...

    • I would argue that in the conversational context "collect" is more a synonym for "store" than for "receive" or "see". Moreso in the context of a tracking system. In my opinion anyway.

  • There is absolutely no reason to collect and store window dimensions, other than for fingerprinting and tracking. Sure it might be an interesting piece of trivia for the dev, but it's not necessary for the dev to "make sure the website works great on all of those dimensions", since that much is already obvious and presumed when making websites these days.

    • Actually there is, this is one of the most interesting metrics. It doesn't have to be precise though, rounding to nearest 50px would be more than enough. I would argue that height and aspect ratio should be collected too. (I didn't downvote you FWIW)

      2 replies →

    • Could there not be value in knowing how many pixels your users have available to view your things? You could presumably get that information from device characteristics but then could also presumably use that for fingerprinting.

      8 replies →

    • Besides... optimizing a site for specific window dimensions? If I see conversion rate is lower on a certain band of dimensions, something likely doesn't display properly. It'd be impossible to test every dimension.

That page doesn't mention that you are also collecting (and make no claim about storing) the globally-visible IP address (and any other data in the IP and TCP headers). This can be uniquely identifying; even when it isn't unique you usually only need a few bits of additional entropy to reconstruct[1] a unique tracking ID.

This is true. The legal department for the healthcare web sites I maintain doesn't let me store or track IP addresses, even for analytics.

I'm only allowed to tally most popular pages, display language chosen, and date/time. There might be one or two other things, but it's all super basic.

> That page doesn't mention that you are also collecting (and make no claim about storing) the globally-visible IP address

I’m not the OP, but where is there evidence that they’re storing the IP? Sure it’s in the headers that they process but that doesn’t mean they’re storing it.

>Regardless of your intentions, you are collecting enough data to track users.

I'd imagine it's difficult to do in depth analytics with tracking users...