← Back to context

Comment by carlsborg

6 years ago

If you strace chrome on linux it also picks up /etc/machine-id (or it did back when I looked), which is a 32 byte randomly generated string which uniquely identifies you and on some systems is used as the DHCP ID across reboots.

First I thought reading /etc/machine-id would be expected if Chrome uses D-bus or pulseaudio libraries which depend on D-bus, and /etc/machine-id is part of D-bus. But no, they really use it for tracking purposes.

And in a sick twist they have this comment for it:

  std::string BrowserDMTokenStorageLinux::InitClientId() {
    // The client ID is derived from /etc/machine-id
    // (https://www.freedesktop.org/software/systemd/man/machine-id.html). As per
    // guidelines, this ID must not be transmitted outside of the machine, which
    // is why we hash it first and then encode it in base64 before transmitting
    // it.

  • In fairness, the guidelines they reference suggest you do exactly what the comment says they're doing (assuming they're keying the hash). The guidelines seem explicitly written with the idea that unique identifiers _derived from_ this value are not similarly quarantined, provided that you cannot take the derived value and "reverse" it back to the original identifier.

    Quoting from https://www.freedesktop.org/software/systemd/man/machine-id....:

    This ID uniquely identifies the host. It should be considered "confidential", and must not be exposed in untrusted environments, in particular on the network. If a stable unique identifier that is tied to the machine is needed for some application, the machine ID or any part of it must not be used directly. Instead the machine ID should be hashed with a cryptographic, keyed hash function, using a fixed, application-specific key. That way the ID will be properly unique, and derived in a constant way from the machine ID but there will be no way to retrieve the original machine ID from the application-specific one.

  • > which is why we hash it first and then encode it in base64 before transmitting it.

    This made me chuckle. "As per the rules, we'll put on a boxing glove before we punch your lights out". You wont get privacy, but at least there is some security!

    • > As per the rules, we'll put on a boxing glove before we punch your lights out

      This also made me chuckle

  • "Tracking purposes" is such a weasel word, when we're really talking about device management in an enterprise setting, and this code only gets activated if the root/administrator user has installed a token file on your computer.

Which (among many other things) can be faked with firejail, if you absolutely have to run Chromium (e.g. for testing):

    --machine-id
        Spoof id number in /etc/machine-id file - a new random id is generated inside the sandbox.
    
        Example:
        $ firejail --machine-id

When puppeteer first came out I was nervous to use it for scraping because I could totally see Chrome pulling tricks like this to help recaptcha in identifying the bots. I’m still not convinced they aren’t.

That's not a correct description.

* http://jdebp.uk./Softwares/nosh/guide/commands/machine-id.xm...