Comment by pdkl95
6 years ago
This is impressive doublespeak.
> This ... header ... will not contain any personally identifiable information
> a seed number which is randomly selected on first run ... chosen between 0 and 7999 (13 bits of entropy)
They are not including any PII... while creating a new identifier for each installation. 13 bits of entropy probably isn't a unique identifier iff you only look at that header in isolation. Combined with at least 24 additional bits[1] of entropy from the IPv4 Source Address field Google receives >=37 bits of entropy, which is almost certainly a unique ID for the browser. Linking that browser ID to a personal account is trivial as soon as someone logs in to any Google service.
> Experiments may be further limited by country (determined by your IP address)
They even admit to inspecting the IP address...
> operating system, Chrome version and other parameters.
...and many additional sources of entropy.
[1] why 24 bits instead of 32? The LSB of the address might be zeroed if the packet is affected by Googles faux-"anonymization" feature ( https://news.ycombinator.com/item?id=15167059 )
> > Experiments may be further limited by country (determined by your IP address)
> They even admit to inspecting the IP address...
I don't think that sentence admits what you say? Chrome could be determining which experiments to run client-side.
Of course, when you visit a Google property, they needs must inspect your IP address to send a response to you, at a minimum. That goes for any site you might choose to visit. The existence of sufficient entropy to personally identify a site visitor is not a state secret. They do not need this chrome experiment seed to identify you, if that's a goal.
Yeah, it's not a "state secret" but it's not common knowledge either. Their privacy policy says that specific header can't be used to identify you, but fails to mention it can be combined with other information to make browser fingerprinting trivial.
If you don't know how all this works, which is true for most human beings, their privacy policy might give you the wrong impression.
> says that specific header can't be used to identify you
That's not what it says. It says the header won't contain PII, which is true. It can be linked to PII, but so can literally every bit of information you send to Google while logged into or otherwise using their services. A disclaimer to this effect would not have any purpose.
11 replies →
So if you use a VPN service for example, they still know who you are because of this. I would say even if you’re visiting in private mode.
I see your point, but I also see how this will keep you identifiable.
I don't math very much, but I would guess the intersection of these sets of people is nil: people who 1) use VPN to avoid tracking by Google 2) still log in to Google services from one of their networks and not the other 3) use the same Chrome profile on both. But suppose some small number exist who adopt this illogical and contradictory pattern of behavior. If Google is using this token for the purpose of tracking this tiny set of people when the vast majority could be tracked more easily via conventional means, it would imply that they are far more competent than I give them credit for.
1 reply →
> They are not including any PII... while creating a new identifier for each installation. 13 bits of entropy probably isn't a unique identifier iff you only look at that header in isolation. Combined with at least 24 additional bits[1] of entropy from the IPv4 Source Address field Google receives >=37 bits of entropy, which is almost certainly a unique ID for the browser. Linking that browser ID to a personal account is trivial as soon as someone logs in to any Google service.
Now this is interesting. If without that 13 bits of entropy, what will Google lost? Is it because of this 13 bits then Google suddenly able to track what they were not? If the IPv4 address, user-agent string, or some other behavior is sufficient to reveal a great deal of stuff, we have a more serious problem than that 13 bits. I agree that 13-bit seed is a concern. But I am wondering if it is a concern per se, or its orchestration with something else. Of course, how/whether Google keeps those data also matters.
One clarification:
- By default it's much more than 13 bits of entropy
- If you disable usage statistics then you are limited to 13 bits of entropy
Actually, the low entropy provider is used for any field trials that get included in the header.
See: https://cs.chromium.org/chromium/src/components/variations/v...
>Now this is interesting. If without that 13 bits of entropy, what will Google lost? Is it because of this 13 bits then Google suddenly able to track what they were not?
At the very least, having those 13 bits of entropy along with a /24 subnet allows you to have device-level granularity, whereas a /24 subnet may be shared by hundreds of households.
They have more than 13 bits of entropy
https://cs.chromium.org/chromium/src/components/metrics/entr...
Look how the function is called, high-entropy source :)
3 replies →
> This ... header ... will not contain any personally identifiable information
Except for everything you do on your browser. I'm so glad I haven't used Chrome for almost three years.
Yes, if you have enough bits you can come up with a fingerprint, but that's not what PII means.
It becomes PII the instant you can correlate that fingerprint with any PII.
This.
A bank account number is consider PII. Knowing the bank name & account number will uniquely identify the account holder's name, which is PII.
8 replies →
Don't forget that just about any registration requires recaptcha these days
>Linking that browser ID to a personal account is trivial as soon as someone logs in to any Google service.
Wat? You mean to tell me they can identify you if you log into their service?
Am I missing something here? Who cares?
I care. I care that I even if I log off, even if I use a vpn, even if I go into incognito mode, they still can associate my requests with the account I initially logged in.
The problem is any website can do that. Incognito-bypassing fingerprinting is difficult to prevent, unless you use something like uMatrix to disallow JavaScript from everything but a few select domains.
This is a collection of random-ish unique-ish attributes. Any collection of such things can be used to track you, like installed fonts, installed extensions, etc. If this were just a set of meaningless encoded random numbers, then it's essentially a kind of cookie, but that's not what it is. This is (claimed to be) a collection of information that's useful and possibly needed by some backends when testing new Chrome features. It tells servers what your Chrome browser supports. The information is probably similar to "optimizeytvids=1,betajsparser=1".
So, the only question is if Google is actually using this to help fingerprint users in addition to the pragmatic use case. It certainly could be used that way, and it's possible they are, but they have so many other ways of doing that with much higher fidelity / entropy if they want to. If this were intended as a sneaky undisclosed fingerprinting technique, I think they would've ensured it was actually 100% unique per installation, with a state space in the trillions, rather than 8000.
Yes, this could be so sneaky that they took this into consideration and made it low-entropy to create plausible deniability while still being able to increase entropy when doing composite fingerprinting, but I think it's pretty unlikely. Also, 99% of the time they could probably just use use Google Analytics and Google login cookies to do this anyway.
6 replies →
I mean, if you don't want Google to track you, then you probably shouldn't use their browser...
I believe someone else in the thread stated it's cleared for incognito, don't remember if they meant it's not sent or that it's a new value.
Normally you would only expect to be identified and tracked when using Google services when logged in. The significance of this post is that they would be able to identify and track you across all your usage of that browser installation regardless of if you've logged out, or say in an incognito window.
Ah. So I was missing something. Thanks for clarifying. That is alarming.
Yes you are missing something important. Once they've tied the browser ID to your personal account they can track you across all google properties, even the ones that you didn't log into.
Unless you're running some extension that emulates FF's container tabs or something, it logs you into all G services. It would matter, though, if this header is still sent in incognito sessions.
I still don't understand. When I log into gmail, it logs me into all Google services. If I am worried about being tracked, surely my first mistake is logging in in the first place? Or visiting in the first place? After all, even if I click "log out," I'm only trusting Google that they unlinked the browser state from the account. If I trust them to do that, I don't see why I shouldn't trust them to ignore this experiment flag from Chrome, or at least not use it for tracking. If I don't trust them to avoid using the experiment state, I don't really see how you can trust them for anything.
Anyway, if you're not building Chrome from source, then you have to trust that they aren't putting anything bad in it. And if you are building chrome from source, you can observe that they only send this experiment ID to certain domains, and they already know who you are on those domains anyway.
1 reply →
If you browse the internet, they could know what websites are visited by the same person, but not who they are exactly.
If you visit a load of websites, then also log into google, they connect the two and they know what websites were visited by you specifically.
he means they can continue to identify you after you log off
I think the argument is they have other methods like cookies they could also use. The fact you trust them not to use those methods extends to this form of tracking.