← Back to context

Comment by ucarion

7 years ago

First off: hats off for making a product that takes the rights of the end user seriously!

However, I am a bit confused as to who would want this product. The sort of questions this product answers seem quite limited:

1. What URLs are getting lots of hits?

2. What referrers are generating lots of hits?

3. What screen sizes are those hits coming from?

What decisions can be drawn from those questions? This seems useful only to perhaps some blog, where they're wondering what sort of content is successful, where to advertise more, and whether to bother making a mobile website.

Without the ability to track user sessions -- even purely in localStorage -- you can't correlate pageview events. For instance, how would I answer a question like:

- How many high-interest users do I have? By "high interest", I mean someone who visited at least three pages on my website.

- Is a mobile website really worthwhile? How much of an effect does being on mobile have on whether someone will be "high-interest"?

I should think some anonymized user ID system -- even if it rotates anonymous IDs -- should be able to answer these questions without compromising privacy.

Also, I'll leave it to others to point out it's unlikely this product is exempt from GDPR.

Since the creator points out that he doesn't store any IP addresses, he doesn't store any data that allows identifying an individual. For the GDPR to be applicable you need to store data that allows you to identify an individual. Thus when you use this, you don't have to think about GDPR.

  • I'm not so sure. By putting this service's code on your website, you transmit personal data (IP addresses) to this third party. That appears to make the GDPR applicable here? Transmission is considered "data processing" under the GDPR.

    Really, the central point that should be clear is that this is a question for lawyers. The GDPR is incredibly far-reaching.

    • The IP necessary for the connection itself is covered under necessary data, you can process it for the purpose of a request without needing consent at all. Same applies to shopping cart cookies or anything else that is essential to running a website and isn't being used for secondary purposes like data mining.

      2 replies →

    • I mean, sure GDRP applies, but little of it’s provisions apply to storing no PII at all.

      If it means your website has to show a message ‘We transmit your info, but save nothing.’ It becomes a bit weird.

Hi,

I might be able to help because I wrote an analytics tool a while back that tracks these three properties and some other stuff

1. Knowing which URLs are being visited allows me to see if a particular campaign or blog site is popular

2. The referrer tells me where a user came from, this is helpful to know if I'm being linked to reddit and should allocate more CPU cores from my host to the VMs responsible for a particular service

3. The screen size allows me to know what aspect ratios and sizes I should optimize for. My general rule is that any screen shape that can fit a 640x480 VGA screen without clipping should allow my website to be fully readable and usable.

4. I also track a trimmed down user agent; "Firefox", "Chrome", "IE", "Edge", "Safari" and other. All will include "(recent)" or "(old)" to indicate version and other will include the full user agent. This allows me to track what browsers people use and if people use outdated browsers ("(old)" usually means 1 year out of date, I try to adjust it regularly to keep the interval shorter)

5. Page Load Speed and Connection. This is a number in 10ms steps and a string that's either "Mobile" or "Wired", which uses a quick and dirty heuristic to evaluate based on if a connection is determined to be throttled, slow and a few other factors. Mobile means people use my website with devices that can't or shouldn't be drawing much bandwidth, Wired means I could go nuts. This allows me to adjust the size of my webpage to fit my userbase.

6. GeoIP: This is either "NAm", "SAm", "Eur", "Asi", "Chin", "OcA", "NAf", "SAf", "Ant" or "Other". I don't need to know more than the continent my users live on, it's good enough data. I track Chinese visitors separately since it interests me.

Overall the tool is fairly accurate and high performance + low bandwidth (a full analytics run takes 4KB of bandwidth including the script and POST request to the server). It doesn't collect any personal data and doesn't allow accurate tracking of any individual.

If I want to track high interest users, I collate some attributes together (Ie, Screen Size, User Agent, Continent) which gets me a rough enough picture of high interest stuff for what I care. You don't need to track specific user sessions, that stuff is covered under the GDPR and not necessary.

Before anyone asks if they could have this tool; nope. It's proprietary and mine. The code I've written for it isn't hard, very minimal and fast. I wrote all this over a weekend and I use influx + grafana for the output. You can do that too.

Both mine and the product of the HN post are likely not in the scope of the GDPR since no data is collected that can specifically identify a user.