← Back to context

Comment by janpot

6 years ago

Not endorsing this, but according to https://www.google.com/chrome/privacy/whitepaper.html#variat...

> We want to build features that users want, so a subset of users may get a sneak peek at new functionality being tested before it’s launched to the world at large. A list of field trials that are currently active on your installation of Chrome will be included in all requests sent to Google. This Chrome-Variations header (X-Client-Data) will not contain any personally identifiable information, and will only describe the state of the installation of Chrome itself, including active variations, as well as server-side experiments that may affect the installation.

> The variations active for a given installation are determined by a seed number which is randomly selected on first run. If usage statistics and crash reports are disabled, this number is chosen between 0 and 7999 (13 bits of entropy). If you would like to reset your variations seed, run Chrome with the command line flag “--reset-variation-state”. Experiments may be further limited by country (determined by your IP address), operating system, Chrome version and other parameters.

This is impressive doublespeak.

> This ... header ... will not contain any personally identifiable information

> a seed number which is randomly selected on first run ... chosen between 0 and 7999 (13 bits of entropy)

They are not including any PII... while creating a new identifier for each installation. 13 bits of entropy probably isn't a unique identifier iff you only look at that header in isolation. Combined with at least 24 additional bits[1] of entropy from the IPv4 Source Address field Google receives >=37 bits of entropy, which is almost certainly a unique ID for the browser. Linking that browser ID to a personal account is trivial as soon as someone logs in to any Google service.

> Experiments may be further limited by country (determined by your IP address)

They even admit to inspecting the IP address...

> operating system, Chrome version and other parameters.

...and many additional sources of entropy.

[1] why 24 bits instead of 32? The LSB of the address might be zeroed if the packet is affected by Googles faux-"anonymization" feature ( https://news.ycombinator.com/item?id=15167059 )

  • > > Experiments may be further limited by country (determined by your IP address)

    > They even admit to inspecting the IP address...

    I don't think that sentence admits what you say? Chrome could be determining which experiments to run client-side.

    Of course, when you visit a Google property, they needs must inspect your IP address to send a response to you, at a minimum. That goes for any site you might choose to visit. The existence of sufficient entropy to personally identify a site visitor is not a state secret. They do not need this chrome experiment seed to identify you, if that's a goal.

    • Yeah, it's not a "state secret" but it's not common knowledge either. Their privacy policy says that specific header can't be used to identify you, but fails to mention it can be combined with other information to make browser fingerprinting trivial.

      If you don't know how all this works, which is true for most human beings, their privacy policy might give you the wrong impression.

      12 replies →

    • So if you use a VPN service for example, they still know who you are because of this. I would say even if you’re visiting in private mode.

      I see your point, but I also see how this will keep you identifiable.

      2 replies →

  • > They are not including any PII... while creating a new identifier for each installation. 13 bits of entropy probably isn't a unique identifier iff you only look at that header in isolation. Combined with at least 24 additional bits[1] of entropy from the IPv4 Source Address field Google receives >=37 bits of entropy, which is almost certainly a unique ID for the browser. Linking that browser ID to a personal account is trivial as soon as someone logs in to any Google service.

    Now this is interesting. If without that 13 bits of entropy, what will Google lost? Is it because of this 13 bits then Google suddenly able to track what they were not? If the IPv4 address, user-agent string, or some other behavior is sufficient to reveal a great deal of stuff, we have a more serious problem than that 13 bits. I agree that 13-bit seed is a concern. But I am wondering if it is a concern per se, or its orchestration with something else. Of course, how/whether Google keeps those data also matters.

    • One clarification:

      - By default it's much more than 13 bits of entropy

      - If you disable usage statistics then you are limited to 13 bits of entropy

      1 reply →

    • >Now this is interesting. If without that 13 bits of entropy, what will Google lost? Is it because of this 13 bits then Google suddenly able to track what they were not?

      At the very least, having those 13 bits of entropy along with a /24 subnet allows you to have device-level granularity, whereas a /24 subnet may be shared by hundreds of households.

      4 replies →

  • > This ... header ... will not contain any personally identifiable information

    Except for everything you do on your browser. I'm so glad I haven't used Chrome for almost three years.

  • >Linking that browser ID to a personal account is trivial as soon as someone logs in to any Google service.

    Wat? You mean to tell me they can identify you if you log into their service?

    Am I missing something here? Who cares?

    • I care. I care that I even if I log off, even if I use a vpn, even if I go into incognito mode, they still can associate my requests with the account I initially logged in.

      9 replies →

    • Normally you would only expect to be identified and tracked when using Google services when logged in. The significance of this post is that they would be able to identify and track you across all your usage of that browser installation regardless of if you've logged out, or say in an incognito window.

      1 reply →

    • Yes you are missing something important. Once they've tied the browser ID to your personal account they can track you across all google properties, even the ones that you didn't log into.

      3 replies →

    • If you browse the internet, they could know what websites are visited by the same person, but not who they are exactly.

      If you visit a load of websites, then also log into google, they connect the two and they know what websites were visited by you specifically.

They key in the wording is: "If usage statistics and crash reports are disabled, this number is chosen between 0 and 7999 (13 bits of entropy)."

"If, statistics are disabled."

In chrome://version you can see the active variations. It seems to be pretty big numbers to be significant, and so far haven't observed duplicates.

Since this header is generated server-side, you have only to believe I guess ? Plus why Doubleclick would need it :)

  • That's basically saying "even if you opt out, we'll still try to track you, just not as much." Very unpleasant, but then again I'm not surprised to see this attitude from Google.

    • Combine a few pieces of information like this and you get a decisively unique fingerprint.

      For example identifying individuals at work behind the same ip address.

How many people will actually run chrome with a cli flag? It would be pretty impressive if every single person reading this thread did, but it probably won't even be that. Most people don't even touch their settings.

13 bits of entropy is far from a uuid (but to get it to that you need to disable some more settings, which again very few people do), but it's still plenty good enough to disambiguate individuals over time.

  • And Google is certainly in a position to disambiguate that uuid to an individual as soon as they login to gmail or any other Google property!

Is there a reason for only sending this header to Google web properties and not all domains?

  • It is an abuse of Chrome's position in the marketplace. Google is using their powerful position to give themselves tracking capabilities that other online players can't access. It is a major competitive advantage for Google.

    • can't alternate browser makers who base on chromium simply disable that portion? like, I expect identifying users was a key business concern in moving Edge to Chromium. Is there something (other than work) preventing them from making it so it'll report back to microsoft-owned domains instead?

      1 reply →

  • Is it because Google's webapps will have their own a/b tests which use experimental features only available in Chrome perhaps?

    I mean personally I think they should do client-side feature detection and be back to being standards compliant and not creepy. The only reason why I'd consider such a flag is because they optimize the payload server-side to return a certain a/b test, but even with that they could do the default version first, do feature detection, and then set a session cookie for that domain only that loads the a/b test.

    My other Thought was that they test a feature that is implemented across Google's properties, e.g. something having to do with their account management.

Couldn't the Chrome installations receive a request from Google that says "Do you want to try out a new thing?", and couldn't the Chrome installations say yes with a certain probability? The only difference I can see is that the subset of users that are guinea pigs couldn't be the same in each test (if Google wanted that the subset is always the same).

So they're tracking people and using them as guinea pigs, the lack of respect for users is astounding.