← Back to context

Comment by rdoherty

2 months ago

Skimming the list, looks like most extensions are for scraping or automating LinkedIn usage. Not surprising as there's money to be made with LinkedIn data. Scraping was a problem when I worked there, the abuse teams built some reasonably sophisticated detection & prevention, and it was a constant battle.

In order to create the data source that LinkedIn's extension-fingerprinting relies on to work, someone (at LinkedIn*?) almost certainly violated the Chrome Web Store TOS—by (perversely*) scraping it.

* if LinkedIn didn't get it from an existing data source

  • Programmers don't appreciate the fact that you can just violate terms of service. You can just do it. It's okay. The police won't come after you. Usually.

    • I think the point is more "in order to prevent people from scraping their site, which is against their ToS, they scraped some other site, against its ToS".

      2 replies →

    • Indeed. I read a lot of comments like these one you are responding on HN. It seems like there is a type of person who thinks that writing down what their rules are has some magical power.

      “This isn’t what it was intended for”. Who cares?

      A long long time ago in a galaxy far far away I would encounter warnings on pirating websites saying “If you are an FBI agent you are not allowed to continue on this site”. Imagine their utter disbelief and shock if they were to be arrested by an FBI agent that clicked past the warning anyway.

      I agree is must be programmers as a type that like rules a lot and, they think, what a perfect world it could be if people would follow them.

      4 replies →

  • 3000 extensions is few enough that a small team could download each extension manually over a few months. You don't need to scrape at all.

    • In the first place, no one said they needed to, only that they probably did.

      Secondly, it's not "3000 extensions". They didn't somehow magically divine that the 2953 (+/-47) extensions we see here were the ones that they needed to download in order to be able to exploit the content-accessible resources described in their extension manifest. They looked at a much larger set, and it got filtered down to these 2953 that satisfied the necessary criteria.

      3 replies →

a problem for linkedin != "a problem". The real problem for people is the back room data brokering linkedin and others do.

from the code doesn't look like they do anything if they have a match, they just save all the results to a csv for fingerprinting?

  • "The code" here you're referring to (fetch_extension_names.js[1]) isn't and doesn't claim to be LinkedIn's fingerprinting code. It's a scraper that the researcher behind this repo wrote themselves in order to create the CSV of the data that they're publishing here.

    LinkedIn's fingerprinting code, as the README explains, is found in fingerprint.js[2], which embeds a big JSON literal with the IDs of the extensions it probes for. (Sickeningly enough, this data starts about two-thirds of the way through the file* and isn't the culprit behind the bulk of its 2.15 MB size…)

    * On line 34394; the one starting:

        const r = [{
                    id: "aacbpggdjcblgnmgjgpkpddliddineni",
                    file: "sidebar.html"
    

    1. <https://github.com/mdp/linkedin-extension-fingerprinting/blo...>

    2. <https://github.com/mdp/linkedin-extension-fingerprinting/blo...>

By looking the list it seems like it is not really “sophisticated”. It is just list based on names (if there is a “email” in the name). Majority of extensions do not even ask for permissions to access linkedin.com.

Wont someone think of poor little LinkedIn, a subsidiary of one of the largest data brokers in the world?

  • Why frame what you are trying to say like that? Businesses of all sizes deserve the ability to protect their businesses from abuse.

    • Do they respect my data? Why do they get to track me across sites when I clearly don't want them to but someone can't scrape their data when they don't want them to. Why should big companies get the pass but individuals not? They clearly consider internet traffic fair game and are invasive and abusive about it so it is not only fair to be invasive and abusive back, it is self defense at this point.

      18 replies →

    • I think they framed it this way because they don't consider scraping abuse (to be fair, neither do I, as long as it doesn't overload the site). Botting accounts for spam is clear abuse, however, so that's fair game.

      5 replies →

    • I'm sure there are issues with fake accounts for scraping, but the core issue is that LinkedIn considers the data valuable. LinkedIn wants to be able to sell the data, or access to it at least, and the scrapers undermine that.

      They could stop all the scraping by providing a downloadable data bundle like Wikipedia.

      3 replies →

    • What is abuse? Is it anything that reduces my profit margin? Or is it anything that makes the world a worse place? The Flock CEO called Deflock terrorism, is he right?

    • this exchange -- obvious critical / perhaps insurrection speech versus a stable voice of business economics -- should be within the purview of an orderly and predictable legal environment. BUT things moved quickly in the phone battles. Some people say that the legal system has never caught up to the data brokering, and in fact the surveillance state grew by leaps and bounds.

      So, reasonable people may disagree. This is a fine place to mention it .. what if individual profiles built at LinkedIn are being combined with illegitimate and even directly illegal surveillance data and sold daily? Everyone stand up and salute when LinkedIn walks in the room? there has to be legal and direct ways to deal with change, and enforcement to complete an orderly and predictable economic marketplace.

      1 reply →

    • We enjoy the fruits of an LLM or two from time to time, derived from hoards of ill gotten data. Linkedin has the resourses to attempt to block scraping, but even at the resource scale of LI I doubt the effort is effective.

      2 replies →

    • The big social media businesses deserve a Teddy Roosevelt character swooping in and busting their trusts, forcing them to play ball with others even if it destroys their moats. Boo hoo! Good riddance. World's tiniest violin.

      This is a popular position across the aisle. Here's hoping the next guy can't be bought, or at least asks for more than a $400M tacky gold ballroom!

  • I mean, regardless of who they are or even if you don’t like what LinkedIn does themselves with the data people have given them, the random third parties with the extensions don’t additionally deserve to just grab all that data too, do they?

    • Eh. I worked at a company which made an extension which scraped LinkedIn. We provided a service to recruiters, who would start a hiring process by putting candidates into our system.

      The recruiters all had LinkedIn paid accounts, and could access all of this data on the web. We made a browser extension so they wouldn’t need to do any manual data entry. Recruiters loved the extension because it saved them time.

      I think it was a legitimate use. We were making LinkedIn more useful to some of their actual customers (recruiters) by adding a somewhat cursed api integration via a chrome extension. Forcing recruiters to copy and paste did’t help anyone. Our extension only grabbed content on the page the recruiter had open. It was purely read only and scoped by the user.

      2 replies →