Google Safe Browsing can kill a startup

5 years ago (gomox.medium.com)

This is actually funny, because I was involved with the creation of this list, way back in 2004. The whole thing started as a way to stop phishing.

I was working at eBay/PayPal at the time, and we were finding a bunch of new phishing sites every day. We would keep a list and try to track down the owners of the (almost always hacked) sites and ask them to take it down. But sometimes it would take weeks or months for the site to get removed, so we looked for a better solution. We got together with the other big companies that were being phished (mostly banks) and formed a working group.

One of the things we did was approach the browser vendors and ask them if we could provide them a blacklist of phishing sites, which we already had, would they block those sites at the browser level.

For years, they said no, because they were worried about the liability of accidentally blocking something that wasn't a phishing site. So we all agreed to promise that no site would ever be put on the list without human verification and the lawyers did some lawyer magic to shift liability to the company that put a site on the list.

And thus, the built in blacklist was born. And it worked well for a while. We would find a site, put it on the list, and then all the browsers would block it.

But since then it seems that they have forgotten their fear of liability, as well as their promise that all sites on the list will be reviewed by a human. Now that the feature exists, they have found other uses for it.

And that is your slippery slope lesson for today! :)

  • This is an amazing story. It really demonstrates the way we pave our road to hell with good intentions...

    We should really do something about this issue, where so few companies (arguably, a single one) hold so much power over the most fundamental technology of the era.

    • Here-here! I really wish there was more human involvement in a lot of these seemingly arbitrary AI-taken actions. Everything from app review to websites and more. This heavy reliance on automated systems has led us down this road. Shoot, keep it, just give us the option to guarantee human review - with of course transparency. We don't need anymore "some human looked at this and agreed, the decisions is final, goodbye."

      I know it's easier said than done, especially when taking the scale of the requests into account, but the alternative has, does, and will continue to do serious harm to the many people and businesses caught in this wide, automated net.

      14 replies →

    • Are you implying that the list no longer has a good intention? I wouldn't be surprised if there are multiple orders of magnitude more phishing and hacked websites in 2021 than there was in 2004. Even with human checking, I doubt you'll even have 0% failure rate. Is the solution to just give up on blocking phishing sites?

      16 replies →

    • The solution is simple: Liability. As soon as it becomes legally infeasible to let algorithms block people, it will stop happening.

      Make it easy and affordable to submit legal complaints for tech misbehavior and make the penalties hurt.

      5 replies →

    • This was the case with railroads too, only a few controlled the biggest and most transforming and business-integral tech of 1800s.

      Prior to that it was those that controlled the printing presses.

      ...

      History continues to repeat itself.

  • Something similar I've just read in zero to one (by Blake Masters and Peter Thiel). Peter argues that computers can't replace humans - it'd be foolish to expect that at least for coming decades – strong AI replacing human is the problem of 22nd century. He proposes Complementarity and provides a successful implementation of this idea in PayPal fraud detection system way back in 2002 when purely automated detection algorithms were quickly overcome by determined fraudsters. He went on founding Palantir based on the same idea.

    >>> In mid-2000, we had survived the dot-com crash and we were growing fast, but we faced one huge problem: we were losing upwards of $10 million to credit card fraud every month. Since we were processing hundreds or even thousands of transactions per minute, we couldn’t possibly review each one—no human quality control team could work that fast. So we did what any group of engineers would do: we tried to automate a solution. First, Max Levchin assembled an elite team of mathematicians to study the fraudulent transfers in detail. Then we took what we learned and wrote software to automatically identify and cancel bogus transactions in real time. But it quickly became clear that this approach wouldn’t work either: after an hour or two, the thieves would catch on and change their tactics. We were dealing with an adaptive enemy, and our software couldn’t adapt in response. The fraudsters’ adaptive evasions fooled our automatic detection algorithms, but we found that they didn’t fool our human analysts as easily. So Max and his engineers rewrote the software to take a hybrid approach: the computer would flag the most suspicious transactions on a well-designed user interface, and human operators would make the final judgment as to their legitimacy. Thanks to this hybrid system—we named it “Igor,” after the Russian fraudster who bragged that we’d never be able to stop him—we turned our first quarterly profit in the first quarter of 2002 (as opposed to a quarterly loss of $29.3 million one year before). The FBI asked us if we’d let them use Igor to help detect financial crime. And Max was able to boast, grandiosely but truthfully, that he was “the Sherlock Holmes of the Internet Underground.” This kind of man-machine symbiosis enabled PayPal to stay in business, which in turn enabled hundreds of thousands of small businesses to accept the payments they needed to thrive on the internet. None of it would have been possible without the man-machine solution—even though most people would never see it or even hear about it.

  • Liability was my first though. How is an assertion that a site contains malware not libel? Site would be easily able to demonstrate lost revenue.

  • Can someone dig out that old agreement to see if Google can be sued big time for this?

    I doubt it but I must say it would make me happy and that would be weird because Schadenfreude normally isn't my thing.

  • > since then it seems that they have forgotten their fear of liability

    They most likely have offloaded the liability to a “machine learning algorithm”. It’s easy for companies to point the finger at an algorithm instead of them taking responsibility.

    • Which then leads them to the awkward place of having to be transparent about how their algorithm work

      Either take responsibility, or be transparent.

      But we all want our cake and eat it

      1 reply →

  • Simple solution = let google use their, imperfect (false-positives) filter, allow them to collect $12 / year not to be blacklisted, and google to send all revenue to the Electronic Frontier Foundation or similar internet defending foundations.

    • Another road to hell paved with good intentions. Once everyone’s paying, who’s to stop them from pocketing the money instead?

      “After careful review, we’ve concluded that the Electronic Frontier Foundation no longer aligns with the goals of Google or its parent company Alphabet Inc. to the extent we require from recipients of our Freedom Fund. We will place these funds in a separate account and use them in ways we believe will be in the best interest of digital freedom, both now and in the future.”

      2 replies →

  • "For years, they said no, because they were worried about the liability of accidentally blocking something that wasn't a phishing site."

    Can anyone explain how a web browser author could be liable for using a blacklist. Once past the disclaimer in uppercase that precedes every software install, past the Public Suffix (White)List that browsers include, how do you successfully sue the author of a software program, a web browser, for having a dommainname blacklist. Spamhaus was once ordered to pay $11 million for blacklisting some spammers, but that did not involve a contractual relationship, e.g., a software license, between the spammers and Spamhaus.

    • I think the situation is actually exactly like the Spamhaus case you describe: it wouldn't be the browser user that sues, but the blocked website's owner. The website's owner need not have accepted any kind of agreement from the browser maker in order to be harmed by the blocklist.

      1 reply →

  • That's very interesting. Would you not think for a moment that such mechanism could be abused?

    • The internet was a much kinder trusting place back then. We assumed when the browser makers agreed to not use it for bad things, we believed them.

      2 replies →

After years of seeing developments like this, getting worse and worse, it fills me with rage to think about how clearly nobody in power at Google cares.

I naively used to think, "they probably don't realize what's happening and will fix it." I always try to give benefit of the doubt, especially having been on the other side so many times and seeing how 9 times out of 10 it's not malice, just incompetence, apathy, or hard priority choices based on economic constraints (the latter not likely a problem Google has though).

At this point however, I still don't think it's outright malice, but the doubling down on these horrific practices (algorithmically and opaquely destroying people) is so egregious that it doesn't really matter. As far as I'm concerned, Google is to be considered a hostile actor. It's not possible to do business on the internet in any way without running into them, so "de-Googling" isn't an option. Instead, I am going to personally (and advise my clients as well) to:

Consider Google as a malicious actor/threat in the InfoSec threat modeling that you do. Actively have a mitigation strategy in place to minimize damage to your company should you become the target of their attack.

As with most security planning/analyzing/mitigation, you have to balance the concerns of the CIA Triad. You can't just refuse Google altogether these days, but do NOT treat them as a friend or ally of your business, because they are most assuredly NOT.

I'm also considering AWS and Digital Ocean more in the same vein, although that's off topic on this thread. (I use Linode now as their support is great and they don't just drop ban hammers and leave you scrambling to figure out what happened).

Edit: Just to clarify (based on confusion in comments below), I am not saying Google is acting with malice (I don't believe they are personally). I am just suggesting you treat it as such for purposes of threat modeling your business/application.

  • Jon Williams, circa 1987, wrote a story of a far-flung humanity's future in "Dinosaurs," in which humans had been engineered into a variety of specialized forms to better serve humanity. After nine million years of tweaking, most of them are not too bright but they are perfect at what they do. Ambassador Drill is trying to prevent a newly discovered species, the Shar, from treading on the toes of humanity, because if the Shar do have even a slight accidental conflict as the result of human terraforming ships wiping out Shar colonies because they just didn't notice them, the rather terrifyingly adapted military subspecies branches of humanity will utterly wipe out the Shar, as they have efficiently done with so many others, just as a reflex. Ambassador Drill fears that negotations, despite his desire for peace, may not go well, because the terraforming ships will take a long time to receive information that the Shar are in fact sentient and billions of them ought not to be wiped out ...

    Google, somehow, strikes me as this vision of humanity, but without an Ambassador Drill. It simply lumbers forward, doing its thing. It is to be modeled as a threat not because it is malign, but because it doesn't notice you exist as it takes another step forward. Threat modeling Lovecraft-style: entities that are alien and unlikely to single you out in particular, it's just what they do is a problem.

    Google's desire for scale, scale, scale, meant that interactions must be handled through The Algorithms. I can imagine it still muttering "The algorithms said ..." as anti-trust measures reverse-Frankenstein it into hopefully more manageable pieces.

    • > Google's desire for scale, scale, scale, meant that interactions must be handled through The Algorithms

      That's fine when you're a plucky growth startup. Less fine when you run half the internet.

      If Google doesn't want to admit it's a mature business and pivot into margin-eating, but risk-reducing support staffing, then okay: break it back up into enough startup-sized chunks that the response failure of one isn't an existential threat to everyone.

      11 replies →

    • > It simply lumbers forward, doing its thing. It is to be modeled as a threat not because it is malign, but because it doesn't notice you exist as it takes another step forward.

      This is a concept that I think deserves more popular currency. Every so often, you step on a snail. People actually hate doing this, because it's gross, and they will actively seek to avoid it. But that doesn't always work, and the fact that the human (1) would have preferred not to step on it; and (2) could, hypothetically, easily have avoided doing so, doesn't make things any better for the snail.

      This is also what bothers me about people who swim with whales. Whales are very big. They are so big that just being near them can easily kill you, even though the whales generally harbor no ill intent.

      3 replies →

    • > “You will have killed us,” Gram said, “destroyed the culture that we have built for thousands of years, and you won’t even give it any thought. Your species doesn’t think about what it does any more. It just acts, like a single-celled animal, engulfing everything it can reach. You say that you are a conscious species, but that isn’t true. Your every action is... instinct. Or reflex.

      Good story. I can imagine what the specialized humans did to the generalist humans eons ago.

    • Except in our case, Google's terraforming ships couldn't care less. It's just not part of their programming that there might be some intelligent life out there worth caring about that might be hurt by their actions, so there's no way for them to receive this information. It's not that it's hard to explain, there's nobody to explain it to.

    • Modern large corporations are just an more inefficient, less effective paperclip maximizer, with humans gumming up the works.

      Google is striving hard to remove the "human" part of the problem.

      1 reply →

    • Your making another perfect case of why Google should be broken up. It’s important that we can choose again.

  • "never attribute to malice that which is adequately explained by stupidity" and all that, but after the events and the almost perfectly orchestrated behavior we've seen in the past and last couple of weeks it's becoming increasingly difficult, at least to me, to not attribute this to malice. Probably deliberate negligence is a better term. They know their systems can make mistakes, of course they do, and yet they build many of their ban-hammers and enforce them as if hat wasn't the case.

    This approach to system's engineering is the technological equivalent of the personality trait I most abhor: the tendency to jump quickly to conclusions and not be skeptical of one's own world-view.

    [1] https://en.m.wikipedia.org/wiki/Hanlon%27s_razor#cite_note-m...

    • "Consciously malicious" is not a good rule of thumb standard to measure threats to yourself or your business; it only accounts for a tiny bit of all possible threats. GP isn't claiming that Google is consciously malicious, they are claiming that you should prepare as if they were. These are not the same thing.

      A lion may not be malicious when it's hunting you, it's just hungry; look out for it anyway. A drunk driver is unlikely targeting you specifically; drive carefully anyways. Nobody at Google is specifically thinking "hehehe now this will ruin jdsalareo's business!" but their decisions are arbitrary, generally impossible to appeal, and may ruin you regardless; prepare accordingly.

      3 replies →

    • I think mistakes just happen and are possibly just as helpful as they are harmful to Google. If they find something they particularly hate or damaging they can just "oops" their way to the problem being gone. Take Firefox[1], each time a service went "oops" on Firefox they gained marketshare on Chrome.

      I have no doubt they'd use similar "oops" for crushing a new competitor in the ad space. Or perhaps quashing a nascent unionizing effort. It's all tinfoil of course because we don't have any public oversight bodies with enough power to look into it.

      [1] https://www.techspot.com/news/79672-google-accused-sabotagin...

      1 reply →

    • Well, I think the stupidity and laziness is exacerbated by their ill will towards customers and users. This is also what prevents them from reforming. The general good will and sense of common purpose was necessary in Google's early days when they portrayed themselves as shepherds of the growth of the web. Now they are more like feudal tax collectors and census takers. Sure they are mostly interested in extracting their click-tolls, but sometimes they just do sadistic stuff because it feels good to hurt people and to be powerful. Any pseudo-religious sense of moral obligation to encourage 'virtuous' web practices has ossified, decayed, been forgotten, or been discarded.

      9 replies →

    • >”never attribute to malice that which is adequately explained by stupidity"

      I keep reading this on the internet as if it’s some sort of truism, but every situation in life is not a court where a prosecutor is trying to prove intent.

      There is insufficient time and resources to evaluate each and every circumstance to determine each and every causative factor, so we have to use heuristics to get by and make the best guesses. And sometimes, even many times, people do act with malice to get what they want. But they’re obviously not going to leave a paper trail for you to be able to prove it.

      18 replies →

    • Employees and managers at Google get promoted by launching features and products. They're constitutionally incapable of fixing problems caused by over-active features for the same reason they've launched seven different chat apps.

      1 reply →

    • I personally find Hanlon's Razor to be gratuitously misapplied. Corporate strategy is often better described as weaponized willful ignorance. You set up a list of problems that shall not be solved or worked on, and that sets the tone of interaction with the world.

      Plus financial incentive creates oh so many opportunities for things to go wrong or be outright miscommunicated it is not even funny.

    • Thanks, I totally agree. Just to be clear I'm not saying it's malice as I don't believe that. I'm just saying the end result is the same so one should consider them a hostile actor for purposes of threat modeling.

      Given you're the second person who I think took away that I was accusing them of malice, I probably need to reword my post a bit to reduce confusion.

      Accusing them of malice is irresponsible without evidence, and if I were doing that it would undermine my credibility (which is why I'm pointing this out).

      1 reply →

    • Your comment made me think that they have the same attitude with support as they do with hiring, they are ok with a non fine-tuned model as long as the false positives / negatives impact individuals rather than Google’s corporate goals.

    • I would argue that a consistent behave defeats the benefit of the doubt or involuntary stupidity. Also I believe most of good sounding quotes may be easy to remember but not backed by many truths.

  • Author here. I don't think it's malice on their part, but their hammer is too big to be wielded so carelessly.

    • Yes I agree with you (and thank you for your medium post by the way. Our only chance of ever improving the situation is to call attention to it. I fully believe Google leadership has to be aware of it at this point, but it clearly won't be a priority to them to fix until the public backlash/pressure is great enough that they have to).

      Just to avoid any misreading, I didn't say I thought it was malice on Google's part. My opinion (as mentioned above, is):

      > I still don't think it's outright malice, but the doubling down on these horrific practices (algorithmically and opaquely destroying people) is so egregious that it doesn't really matter.

      So they are not (at least in my opinion without seeing evidence to the contrary) outright malicious. But from the perspective of a site owner, I think they should be considered as such and therefore mitigations and defense should be a part of your planning (disaster recovery, etc).

      1 reply →

    • Have you considered not using a 3rd party for hosting your JavaScript? There is always going to be some risk if the code isn’t under your control.

    • Is this list only maintained by Google? Do Firefox and Bing use the same list, is their process better/different? Is there any sharing happening?

      1 reply →

    • Great article. It’s not malice, it’s indifference.

      Googles execs and veeps don’t care about small businesses, because most are career ladder climbers who went straight from elite colleges to big companies. Conformists who won’t ever know what it’s like to be a startup. As a group, empathy isn’t a thing for them.

      3 replies →

    • That is malice.

      Accidentally unleashing a process that harms people is negligence. Not caring that you are being negligent is malice.

    • IMHO, it sounds like it worked. The things you changed sound like it's made your site more secure. In the future, Googles hammer can be a bit more precise since you've segregated data.

      And you don't know what triggered it. It's possible that one of your clients was compromised or one of their customers was trying to use the system to distribute malware.

      5 replies →

  • It's probably "scale thinking" that makes google seem like they don't care: Everything is huge when you're "at scale"; the impact of a small blunder can take down companies or black out nation states. It's part of the game of being "at scale". They probably believe that it's untenable to build the necessary infrastructure to where everything (website, startup, person, etc.) matters.

    This will sound crass, but it reminds me of Soviets cutting off the food supply to millions of people over the winter, due to industrial restructuring, and they brushed it off as "collateral damage".

  • Of course they care. They've taken over everything they've been able to take over and they're still going strong. This is not by mistake. They just care about different things than you do. This is why Google needs to be broken up.

  • > I am not saying Google is acting with malice (I don't believe they are personally)

    I'd agree. The problem is there is no financial or regulatory incentive to do the right thing here.

    It has zero immediate impact on their bottom line to have things work in the current fashion, and the longer term damage to their reputation etc. is much harder to quantify.

    There's no incentive for them to fix this, so why would they?

  • They're never gonna care. They aren't incentivized to care. The only thing that can change the situation is the power of the American federal government, which needs to break Alphabet into 20-50 different companies.

  • > nobody in power at Google cares

    My assessment might be “nobody in power has time to prevent the myriad of problems happening all of the time, even though they handle the majority, with help from businesses, government agencies, etc., and given the huge impact of some problems to society as a whole, they may feel as though they’re rising in the front seat of a roller coaster, unaware of your single voice among billions from the ground down below.”

  • > they probably don't realize what's happening and will fix it

    “If only the czar knew!”

  • I'm with you on the rest, but what has DO done to not have the benefit of doubt?

    Also, to your point, an organization becomes something else than the sum of its parts, especially the bigger it gets.

    Google can be a malicious actor without necessarily having individuals make act maliciously.

    • Yeah that's a fair question. I had a bad personal experience with them, but I've also seen plenty of issues too. There was a big one a little while ago about how Digital Ocean destroyed somebody's entire company by banning them with AI: https://twitter.com/w3Nicolas/status/1134529316904153089

      In their defense they acknowledged it and some changes. I can't find the blog post now so going from memory. But that only happened because he got lucky and it blew up on HN/twitter and got the attention of leadership at DO. How many people have beenh destroyed in silence?

      In my case, Digital Ocean only allows one payment card at a time and my customer (for whom the services were running) provided me with a card that was charged directly.

      A couple months later my customer forgot that he had provided the card. He didn't recognizer "Digital Ocean" and thought he had been hacked (which has happened to him before) and called the bank and placed a chargeback.

      When DO got the charge back they emailed me and also completely locked my account so I was totally unable to access the UI or API. I didn't find out about the locked account until the next day. I responded to the email immediately, and called my customer, who apologized and called the bank to reverse the chargeback. I was as responsive as they could have asked for.

      The next day I needed to open a port in the firewall for a developer to do some work. I was greeted with the dreaded "account logged" screen. I emailed them begging and pleading with them to unblock my account. They responded that they would not unlock the account until the chargeback reversal had cleared. Research showed that it can take weeks for that to happen.

      I emailed again explaining that this was totally unacceptable. It is not ok to have to tell your client "yeah sorry I can't open that firewall port for your developer because my account is locked. Might be a couple of weeks." After a day or so, they finally responded and unlocked my account. Fortunately they didn't terminate my droplets, but I wonder what would have happened if I had already started using object storage as I had been planning. This was all over about $30 by the way.

      After that terrifying experience, I decided staying on DO was just too risky. Linode's pricing is nearly identical and they have mostly the same features. Prior to launching my new infrastructure I emailed their support asking about their policy. They do not lock accounts unless the person is long-term unresponsive or has a history of abuse.

      I've talked with Linode support several times and they've always been great. They're my go to now.

      2 replies →

  • I'd go a step further and claim that most tech companies are ultimately a threat to people's freedom and happiness. Not the tech itself, but the people that wield and profit from it.

  • They care, but the dominant policy in Google's calculus about what features should be released is "Don't let the exceptional case drown the average case." A legitimate SaaS providing business to customers might get caught by this. But the average case is it's catching intentional bad actors (or even unintentional bad actors that could harm the Chrome user), and Google isn't going to refrain from releasing the entire product because some businesses could get hit by false positives. They'd much rather release the service and then tune to minimize the false positives.

    To my mind, one of the big questions about mega corporations in the internet service space is whether this criterion for determining what can be launched is sufficient. It's certainly not the only criterion possible---contrast the standard for us criminal trial, which attempts to evaluate "beyond a reasonable doubt" (i.e. tuned to be tolerant of false negatives in the hope of minimizing false positives). But Google's criterion is unlikely to change without outside influence, because on average, companies that use this criterion will get product to market faster than companies that play more conservatively.

    • Nah-- I think you've got it all wrong. The problem isn't the false positive/false negative ratio chosen.

      The problem is that there's false positives with substantial harm caused to others and with little path left open to them by Google to fix them / add exceptions-- in the name of minimizing overhead.

      Google gets all of the benefit of the feature in their product, and the cost of the negatives is an externality borne by someone else that they shrug off and do nothing to mitigate.

      1 reply →

  • Google Safe Search is only half the story. Another huge problem is Google's opaque and rash decisions about what sites throw up warnings in Chrome.

    I once created a location-based file-transfer service called quack.space [0] very similar to Snapdrop, except several years before they existed. Unfortunately the idiot algorithms at Chrome blocked it, throwing up a big message that the site might contain malware. That was the end of it.

    I had several thousand users at one point, thought that one day I might be able to monetize it with e.g. location based ads or some other such, but Google wiped that out in a heartbeat with a goddamn Chrome update.

    People worry about AI getting smart enough to take over humans. I worry about the opposite. AI is too stupid today and is being put in charge of things that humans should be in charge of.

    [0] https://www.producthunt.com/posts/quack-space

    [1] https://snapdrop.net/

  • Google has a lot of control of the Web.

    Much less control of the Internet.

    One lesson is use IP and not the Web.

  • > I use Linode now as their support is great and they don't just drop ban hammers and leave you scrambling to figure out what happened.

    Linode once gave me 48 hours to respond (with threats to take down the site) because a URL was falsely flagged by netcraft based on what looked like an automated security scan of software I was hosting. Granted, they did not take any action and dropped the report once I pointed out that it was bullshit, but I do not consider this great service. If there is no real evidence of wrongdoing I should not be receiving ultimatums.

  • (Googler)

    You are only focusing on the negatives while completely ignoring the positives here.

    Here are a few questions to consider that may give you better perspective:

    1) Do you know the magnitude of financial and psychological damage caused by malware, phishing, etc on the web?

    2) Do you believe that it is possible to have a human review every piece of automation generated malware on the internet?

    3) Do you believe it is possible to build an automated system that provides value with zero false positives?

    4) Do you think an open standards body or government bureau would perform any better at implementing protections from the threats described here?

    • Author here - I don't underestimate the complexity of the task that Google Safe Browsing tries to accomplish.

      But: Do you believe there is no room for improvement in an automated, opaque system with clear evidence of malfunction, that quite succinctly decides if hundreds of people go unemployed when their company tanks for nothing other than an incorrectly set threshold on some algorithm?

      That is the real question to ask. Google is nowhere near its limits in terms of capability, as is made abundantly clear by its extremely comfortable financial position.

      2 replies →

    • 2*) Do you believe that it is possible to have a human review every FALSE POSITIVE result from automated malware detection on the internet, when reported by those adverse affected by the false positive result?

      Yes, yes I do. Banks do it for their customers today at scale.

      2 replies →

  • Your clients will hate you for this as you are creating false positives. Sure, Google is sometimes unethical, but calling them a malicious actor? Really?

  • Following "Consider Google as a malicious actor/threat" with "I am not saying Google is acting with malice" is probably a strong indicator that you should have thought it through before posting it.

    • "Consider as" does not mean "is". Your lack of reading comprehension is not the fault of the poster.

It's a relatively long article - but it does not answer one simple question, which is quite important when discussing this: were there any malicious files hosted on that semi-random Cloudfront URL? I realise that Google did not provide help identifying it - but that does not mean one should simply recomission the server under a new domain and continue as if nothing has happened!

From TFA:

> We quickly realized an Amazon Cloudfront CDN URL that we used to serve static assets (CSS, Javascript and other media) had been flagged and this was causing our entire application to fail for the customer instances that were using that particular CDN

> Around an hour later, and before we had finished moving customers out of that CDN, our site was cleared from the GSB database. I received an automated email confirming that the review had been successful around 2 hours after that fact. No clarification was given about what caused the problem in the first place.

Yes, yes, Google Safe Browsing can use its power to wipe you off the internet, and when it encounters a positive hit (false or true!) it does so quite broadly, but that is also exactly what is expected for a solution like that to work - and it will do it again if the same files are hosted under a new URL as soon as detects the problem again.

  • Author here. Nothing was fixed, and the blacklist entry was cleared upon requesting a review, with no explanation.

  • They seem to be unable to answer this question since Google provided no URL. Without knowing what is considered malicious, how could they check if there was anything? What if it is a false positive?

  • I am just guessing here, but in case the author had their service compromised, maybe he can't disclose the information. Feels like they know what they are doing, and at least to me, reading between the lines, it looks like they fixed their problem and they advice people to fix it too:

    > If your site has actually been hacked, fix the issue (i.e. delete offending content or hacked pages) and then request a security review.

    • Author here. We didn't do anything other than request the flag to be reviewed.

      The recommended steps for dealing with the issue listed in the article were not what we used, just a suggested process that I came up with when putting the article together. Clearly, if the report you receive from Google Search Console is correct and actually contains malware URLs, the correct way to deal with the situation is to fix the issue before submitting it for review.

  • Yes, I guess if you're allowing users to upload arbitrary files that may contain viruses or malware, and you're not scanning the files, that makes you a potential malware host. That's how Google may see it. They're trying to protect their users, and you've created a vector for infection.

  • Whether or not this author's site was or was not hosting malicious content is irrelevant to the thrust of the article, which is that due to browser marketshare, Google has a vast censorship capability at the ready that nobody really talks about or thinks about.

    Think about the jurisdiction Google is in deciding that they want to force Google to shut down certain websites that correspond to apps that they've already had them and Apple ban from the App Store, for "national security" or whatever.

    This is one mechanism for achieving that.

  • If there was malicious content, the search console would have provided a sample URL. It didn't.

Our company [0] was also hit by this too.

We receive email for our customers and a portion of that is spam (given the nature of email). Google decided out of the blue to mark our attachment S3 bucket as dangerous, because of one malicious file.

What's most interesting is that the bucket is private, so the only way they could identify that there is something malicious at a URL is if someone downloads it using Chrome. I'm assuming they make this decision based on some database of checksums.

To mitigate, we now operate a number of proxies in front of the bucket, so we can quickly replace any that get marked as dangerous. We also now programmatically monitor presence of our domains in Google's "dangerous site" database (they have APIs for this).

0: https://www.enchant.com - software for better customer service

  • Author here. I'm not sure exactly how they actually decide to flag. Alternatively, Amazon might somehow be reporting files in S3 onto the Google blacklist.

    It would seem surprising, but it's the other possibility.

  • > What's most interesting is that the bucket is private, so the only way they could identify that there is something malicious at a URL is if someone downloads it using Chrome. I'm assuming they make this decision based on some database of checksums.

    Doesn't Chrome upload everything downloaded to VirusTotal (a Google product)?

    • > Doesn't Chrome upload everything downloaded to VirusTotal (a Google product)?

      It doesn't, unless you opt for SafeSearch "Enhanced Protection" or enable "Help improve security on the web for everyone" in "Standard Protection". Both are off by default, IIRC. Without it, it periodically downloads what amounts to bloom filter of "potentially unsafe" URLs/domains.

      On the other hand, GMail and GDrive do run the checks via VirusTotal, as far as we know - which means that OP case may have been caused by having some of the recipients having their incoming mail automatically scanned. It's similar for Microsoft version (FOPE users provide input for Defender Smart Screen), at least last time I checked.

      5 replies →

    • The hashes of all things that match a "probably evil" bloom filter, yes.

      Hosting a virus on a domain and then downloading it a few times with different chrome installations sounds like a good way to get the whole domain blacklisted...

      15 replies →

    • Sounds rather too resource-intensive? I've just tried with current Chrome on Windows and a 32MB zip on my personal domain, Wireshark says the file has not been sent anywhere.

      4 replies →

Yes, the power of something like Google Safe Browsing is scary, especially if you consider the many many downstream consumers who might have an even worse update / response time. Responsiveness by Google is not great, as expected, we recently contacted Google to get access to the paid WebRisk API and haven't heard anything in a few months...

However, phishing detection and blocking is not a fun game to be in. You can't work with warning periods or anything like that, phishing websites are stood up and immediately active, so you have to act within minutes to block them for your users. Legitimate websites are often compromised to serve phishing / malicious content in subdirectories, including very high-level domains like governments. Reliable phishing detection is hard, automatically detecting when something has been cleaned up is even harder.

Having said all that, a company like Google with all of its user telemetry should have a better chance at semi-automatically preventing high-profile false positives by creating an internal review feed of things that were recently blocked but warrant a second look (like in this case). It should be possible while still allowing the automated blocking verdicts to be propagated immediately. Google Safe Browsing is an opaque product / team, and its importance to Google was perhaps represented by the fact that Safe Browsing was inactive on Android for more than a year and nobody at Google noticed: https://www.zdnet.com/article/mobile-chrome-safari-and-firef...

Lastly, as a business owner, it comes down to this: Always have a plan B and C. Register as many domains of your brandname as you can (for web, email, whatever other purpose), split things up to limit blast radius (e.g. employee emails not on your corporate domain maybe, API on subdomain, user-generated content on a completely separate domain) and don't use external services (CDN) so you can stay in control.

Of particular note:

"Don't host any customer generated data in your main domains. A lot of the cases of blacklisting that I found while researching this issue were caused by SaaS customers unknowingly uploading malicious files onto servers. Those files are harmless to the systems themselves, but their very existence can cause the whole domain to be blacklisted. Anything that your users upload onto your apps should be hosted outside your main domains. For example: use companyusercontent.com to store files uploaded by customers."

  • Pardon my ignorance as I have few years of web dev experience. What exactly does it mean to store data on a domain? Does he mean serve data via a domain URL? And if so, how does Google have discovery of that data?

    • Author here. Yes, "serve" is the correct interpretation. It is not clear how Google gets ahold of offending URLs within blacklisted domains (like the article says, there were no offending URLs provided to us).

      Theories:

      * Obtained from users of Google Chrome that load specific URLs in their browsers

      * Obtained from scanning GMail emails that contain links to URLs

      * Obtained from third parties that report these URLs

      4 replies →

    • We’re pretty sure they get reports from Chrome. A security researcher at my workplace was running an exploit against a dev instance as part of their secops role and got the domain flagged, despite the site being an isolated and firewalled instance not accessible to the internet.

      5 replies →

    • We use a fair number of google products, and you can turn on a lot of enhanced protection, and many businesses do. This means even password protected / private URLs may generate scans from what I've seen. I'm not sure how they actually fingerprint files (maybe locally) but it seems pretty broad

      This seems to work across a lot of google products (gmail, drive, chome etc) so it scoops up a ton.

      More here:

      https://security.googleblog.com/2020/05/enhanced-safe-browsi...

      Not sure if this is related to safe browsing. We also can turn on more scanning and other features of all email users.

      The key though, if you allow users to PUT files onto your S3 (even private / signed in) then google may scan them. That means if your user uploads a suspicious file to a trouble ticket system, if there IS a virus in there and google sees it, wham. Obviously most folks will segregate those uploads off into their own s3 bucket by user/account to avoid contamination, but you really have to be careful not to hose viruses AT ALL on your key domains.

  • I imagine your service still won't have a great time when Google blacklists companyusercontent.com

    A proper mitigation would be to serve user data from one domain per user, no?

“Don't host any customer generated data in your main domains. ”

This is extremely important for multiple reasons. One reason is the blacklisting as mentioned in the article, the other reason is security: browser typically implement security policies around domains as well, such as cookie scoping and whatnot. Putting all user generated content under a completely separate domain avoids a whole category of potential issues.

  • How do you do this in practice though? Let's say my marketing site is at turtlepics.com and then the pics, captions, feeds, etc are served off of turtlepicscontent.com.

    So I can serve my app off of turtlepics.com, that's fine. But it can't load any content directly. I'd have to have a separate <script src="https://turtlepicscontent.com/feeds/erik"> or whatever that loads a user's feed. But that needs to be authenticated too, so I have to then authenticate the user on that domain (https://turtlepicscontent.com/feeds/erik?onetimekey=a9e79c58...) as well, at which point the credentials are present in the unsafe domain's cookies as well, and the jig is up.

    Or do you continually generate fresh one time keys in the safe app, so that you don't need cookies on the content domain?

    Even then, someone can still bring down the entire turtlepicscontent.com domain with malicious content. Which... well, at least your marketing site and your login still works. But the site is still fully down at that point. I guess that's better than nothing, but still pretty annoying.

    Or is the idea just to wall off uploads specifically, but continue serving text off the main domain, presuming you're sanitizing text correctly?

    I guess you could have some fallback to a safe-ish domain with an older read-only backup of the content database? Still not ideal. I guess sharding your users onto multiple domains based on account age might help a bit too.

    • You don't necessarily need to authenticate users on that domain with a cookie. An HMAC token would be ideal, because you don't have to maintain state.

      Don't hardcore the content domain. In case the content domain gets flagged, it should be easy to change to a new domain.

      The assets themselves (such as images, scripts, etc) can have any browser cache expiration time. HTML documents cache duration will matter, and once that has elapsed, browsers should start to use the new content domain.

      1 reply →

  • For example, if someone manages to upload HTML and trick your system into serving it with a content type that browsers will interpret as HTML, then they can modify or exfiltrate your user's cookies. This could allow impersonation attacks, XSS, etc.

    (Disclosure: I work for Google, speaking only for myself)

I learned the hard way that other companies than Google also contribute to the blacklist. A site I was working on got falsely flagged by netcraft.com (which they admitted after I spent a week explaining it to them). They do some kind of active AI cyber defence bollocks and have netflix as a customer. Their Automated Idiot classified our login page as trying to phish netflix.

The fun part of this is that I could have prevented this if I had seen the warning email that Google sent me, but since Gmail classified it as an email phishing attempt, I never saw it (straight to spam folder). How ironic.

Consequences:

- Our website was blocked in all major browsers, not just chrome

- AWS, who also look at the blacklist and were contacted by netcraft automatically, threatened to delete our account. I had to convince both parties that we did nothing wrong

- One week offline

If their claim is false, then is it, in any jurisdiction, libelous?

Maybe, legislation to bring consequences for false claims will help ensure algorithms, and the support teams that monitor them, do a better job. In an internet focused world, especially one with lock downs, wiping sites off of the internet with false claims is a heinously bad act.

  • I'm unsure whether it would be ruled libel, but I lean towards yes it would. There are two ways of seeing it:

    1) It is a false statement by google themselves (so no §230 protection) that caused material damage and is thus libelous 2) It is an opinion protected under free speech, and the free behavior of a private company, and the words like "may have" show it is not a statement of fact, and "deceptive" is just an opinion.

    Yet it feels very wrong and definately Google's fault, and Google should be responsible for the damages, morally speaking.

    It's more than just a false statement, the pop-up is keeping users from visiting the website. However, Google doesn't intend to harm these companies in order to gain competitive advantage, it just harms them accidently, so the monopoly argument also has problems.

    It seems to me that we need a new law, or that current jurisprudence has let this one slip through and perhaps there will (in America) never be a proper crime for this situation due to divergent jurisprudence in this space that left open this gap.

    I would like to know whether it has been tested in court, or if anyone is in process of doing so.

  • As of today there are no legal protection framework for digital services.

    Banking is heavily regulated , you are protected by hundreds if not thousands of laws.

    For digital services ? Twitter and Google can legitimately suspend ALL your accounts because you liked a Trump video on YouTube or Tweeted something « Hateful » to Biden.

    You can try to go court. You will loose 100% of the time. They are private businesses operating within their own terms, there is not « false » flag or wrong « ban »

    They’re private businesses offering a free service, they can cease to offer that at any moment that they want.

    • In this case they do not provide a service to the OP. There is no agreement between OP and Google.

      This is happening on browsers of their customers. And I'm quite sure that if Google hits a company that competes with Google services there must be a law that they will be breaking.

      There was a big case in Poland where Google blocked a SaaS web shop provider using the same exact mechanism [0]. Polish courts decided that Google claims displayed on block page were untrue. Unfortunately, the suing company did not receive compensation, because Google Poland does not operate Chrome browser. The court indicated that the right party to sue is Google incorporated in USA...

      [0] https://www.silesiasem.pl/iai-przegralo-proces-sadowy-z-goog...

      5 replies →

  • There may be a number of civil causes of action available...

    But litigate against a multi-billion dollar tech company? good luck.

    These companies are borderline immune to prosecution by the government, much less a small business.

  • This seems like something the FTC should be looking into, abuse of market position.

I can confirm everything that was said in that article. I run a free dynamic dns service (freemyip.com) and every time someone creates a subdomain that later hosts some questionable material, Google will immediately block my whole domain. Their response time for clearing these up varies from a few hours to two weeks. It feels completely random. I once had a malicious subdomain that I removed within two hours, yet the ban on Google lasted for more than two weeks. Now, this is a free service so bans like these don’t really matter that much to me, but if it was a business, I would have most likely gone bankrupt already.

I noticed that recently, they are only sending me the warning, but don’t block me right away. Perhaps after a few years of these situations I advanced to a more “trusted” level at Google where they give me some time to react before they pull the plug on my domain. I don’t know. But I would be truly petrified of Google if this was my real business.

  • Have you considered requesting that your domain be added to the public suffix list? https://publicsuffix.org/

    If subdomains of your domain should be treated as independent sites, the public suffix list is (sadly) how you communicate that to browsers.

    (Disclosure: I work for Google, speaking only for myself)

    • Fascinating. I had never heard of this, and cloudfront.net is in there, which might provide a clue as to why Google only blacklisted our subdomain and not the whole thing (imagine that!).

      Is there any downside to being on this list?

      1 reply →

  • Author here. This is fascinating because I figured Google would definitely not ban cloudfront.net entirely and that's why they blacklisted the subdomain, but had this been hosted on our actual company domain, would we have been spared?

1- Ban self dealing.

Even the appearance of a conflict of interest should be treated as an actual conflict of interest.

Among all the other countermeasures being considering, breaking apart these monopoly's end-to-end integrations should be top priority.

For comparison: I'm a huge Apple fan boy. I'm in a happy monogamist relationship with Apple (h/t NYU Prof Scott Galloway).

There's no question their awesome products are largely due to their keiretsu, monopsony, and other anti-competitive practices. So despite my own joy, I also support breaking up Apple, for the greater good.

The same applies to Google's offerings. Google Chrome cannot be allowed to operate without oversight. Once a product or service becomes an important pillar in a market, it must be held accountable.

2- Fair and impartial courts.

Governments make markets. Google (et al) act as sovereign governments running their private markets. This is unacceptable.

We all must have the right to negotiate contracts, appeal decisions, and other misc tort. To be adjudicated in an open, fair, impartial courts overseen by professional and accountable judges.

In other words, I demand the rule of law.

Again using Apple as my example. As a customer, I benefit hugely from Apple's App Store, where they vet and curate entries. This is awesome.

But Apple must be held accountable for all of their decisions. All participants must have the right to sue for damages. In a fair and impartial court system, independent of Apple's total control over the market.

Similarly, however Google is administrating the Safe Browsing infrastructure, it must be transparent, accountable, auditable.

--

I'm still working on this messaging, phrasing. Criticisms, editing, word smithing much appreciated.

  • > Criticisms, editing, word smithing much appreciated.

    My loose thoughts, feel free to use. (Reordered 2 before 1.)

    2. In any bigg-ish privately regulated market, the membership needs to be based on public, objective rules and under a real jurisdiction. If you paid and obeyed the regulations and have been banned/mistreated, you can sue.

    1. For any market, if a company (Google or other) has a clear majority of it, they have additional responsibilities.

    "Customer is free to go away to our competitors" does not tell a full story (illustrated by OP). The cost to switch is the real deal here.

  • Must be held accountable -> must be operated autonomously from the rest of the business

    • Yes. I'd like this better explained. Those "Chinese firewalls" meant to keep biz units apart always seem to be completely fictional. Ditto "self policing".

one of my apps my company makes is a chat app, when someone clicks a link in chat, we bounce them to a URL redirect page ("Warning, you're leaving $app, don't enter your account password/information phishing warning" type page) with a button "Continue to $url" - We also have a domain blocklist to block known phishing sites for our app. Because of this, Google blocked our entire domain due to malicious urls (the "This link was blocked" page) It took us weeks to get it unblocked. Just an utter pain in the butt. We're an established business, but having our entire website blocked by Chrome for weeks nearly killed the entire app.

> I received an automated email confirming that the review had been successful around 2 hours after that fact. No clarification was given about what caused the problem in the first place. ... We never properly established the cause of the issue, but we chalked it up to some AI tripping on acid at Google's HQ.

I expect more of this Kafkaesque experience to come in the future.

This is no longer a technical problem, but a social one. It can only be solved through legislation.

  • Author here. The second time around, the review confirmation email took around 12 hours to get to us.

    • Thank you for posting all the info here. And I’m glad that you managed to fix the problem. I think it must have caused you a lot of stress.

      1 reply →

This reminds me of email blacklisting. When I was "young" I operated an email server for 6000 users. Keeping that server and our domain away from blacklisting was a full-time job.

It wasn't enough to secure your server: Any spam or virus coming from the internal network through that email server could potentially blacklist us. Basically, you had to treat your users as untrusted, and run anti-spam and anti-virus filtering that was as good as whatever the rest of the Internet was running.

IIRC, although blacklisting was done by non-profits, it was still rather opaque: Blacklisting should be traumatizing, so that you (and your higher ups) are forced to do a proper risk assessment and actually implement it. It was also opaque to make it harder for the bad guys to move quickly.

I hate the increasing influence that big tech has on small tech. But keeping web and email safe and clean is a cat-and-mouse game, which, unfortunately, also adds burden to the good folks.

  • today Microsoft is the worse. It blacklists your ip from unsuspecting customers using outlook, live.com, etc.. and there is no way to recover from it without becoming yourself a customer. it's vicious because the users of their products are mostly businesses and they are acting as a gateway for doing business with them.

    • Definitely annoying. But how much is this anti-competitive business practices, and how much is this "raising the bar for the bad folks". Unfortunately, the latter inevitably adds burden to good folks too.

The section about ants and Google shifting on its planetary chair is perhaps the best part of this article. A sobering way to look at it.

I run https://news.ycombinator.com/item?id=21907911), Google still hasn't figured out that the web is their content providers and they need to support them, and treating their producers with contempt and neglect is a glorious example of how shortsighted the entire company is right now about their long term strategy (how many ads will you sell when the web is a mobile Facebook app?). They should as soon as possible, as a bare minimum, start providing representatives and support for the content providers that make people actually use the web and help them to be successful, similar to how Twitch has a partnership program.

  • > If things ever got extremely horrible, I used to have a friend that worked at google but she no longer works there. The other person I knew that works at google stopped responding to my tinder messages, so I'm pretty much doomed the next time they do something ultra crazy and I need emergency support.

    Doing good for the sake of the web, even while dating, that's some next level dedication. :)

    • Sad nerd seeks third to join 24/7 D/s relationship with my internet daddy

      (just lightening the mood a bit, but it's a true story. I wasn't asking for google support)

  • Just want to add that neocities is a cultural treasure and I appreciate the work you put into it! I'm sad to hear that Google "Safe" Browsing once again rears it's ugly head blocking legitimate websites, yet I still see scams and phishing show up on ad sponsored links for Google search results.

    I could foresee in the future all of us having to pay the toll so our hosted websites are considered "safe" too...

    • Thanks!

      Yeah I don't think Safe Browsing shouldn't exist, but it definitely needs some improvements and feedback that's appropriate for how incredibly powerful/dangerous it is.

  • > The other person I knew that works at google stopped responding to my tinder messages.

    This has to be the best anecdote for Google's broken tech support that I've ever heard. :)

  • If Google is falsely claiming your malicious and its harming your business it seems like a pretty clear case of slander/tortuous interference.

    • I mean we could certainly use the money obviously, but it's not really my goal to sue Google (I probably can't afford it anyways). I just want them to improve. I see them as a partner and only ask that they see us as the same. They certainly have the resources for it.

  • > because we block uploads of executable files, it is _literally impossible for users to host malware on our servers_

    How does this stop bad actors from exploiting bugs in e.g. V8 with malicious JavaScript?

    • > How does this stop bad actors from exploiting bugs in e.g. V8 with malicious JavaScript?

      You're correct: it doesn't. Blocking executable files aren't enough. Javascript files, zips containing executables, malicious Word files...all of these are vectors.

      1 reply →

  • Of course they care. They've taken over everything they've been able to take over and they're still going strong. This is not by mistake. This is why Google needs to be broken up.

I wonder if it would be faster to deal with this through legal. I’m not a lawyer, but I wonder if you could send a C&D to Google legal or something because this seems like an actual case of slander and reputation damage.

  • If your systems have any number of nines in their SLA, drafting a letter to Google's legal department is not a viable strategy.

  • If you are a big enough company your lawyers could have a stern but relatively friendly chat with Google’s lawyers.

    I can neither confirm or deny this myself...

    • Yeah my thought behind this was you are a large enough or wealthy enough company that you can afford lawyers. If you are an individual or mom and pop business whose blog or small e-commerce shop are blocked then you are probably SOL.

  • Once you enter litigation with Google, good luck accessing your Android.

    You may believe this is extreme, but many people have had their Gmail account suspended without known reason. So if they also have a reason...

    • So de-google first, then sue.

      Otherwise you might as well give up and conclude that google not just controls the internet but is also above the law.

I provide Windows builds of ffmpeg, linked via http://ffmpeg.org/download.html. The site is entirely static, no user data is collected or stored.

Starting in late October, lasting for around a month, users would get the dreaded red page upon visiting the site at https://www.gyan.dev/ffmpeg/builds/

Search Console would show a couple of files as 'install malicious or unwanted software'. Never mind that all files are plain archives (7z,ZIP) with no installers or even self-extraction, containing CLI apps. These file URLs when scanned via Virustotal (Google-owned) would be flagged by Google Safe-browsing and no other engine. Weird thing is, the same files mirrored at Github would be detected as clean. A review request at SC would get rid of the warning temporarily only to return after a day or two.

I found no support email so I opened a thread at Google Webmaster community (now called Search Central community). But there was no help and none of the regulars seem to be Google employees. Finally, I found an email through Mozilla's page on their use of Google's Safe Browsing blacklists at https://support.mozilla.org/en-US/kb/how-does-phishing-and-m... which leads to https://safebrowsing.google.com/safebrowsing/report_error/?t.... This page's title is 'Report Incorrect Forgery Alert' which would indicate a different purpose but I managed to get hold of human attention. After 10 days or so, the warnings disappeared. Till date, I don't know what triggered the warnings in the first place, and so how to prevent a recurrence.

We got hit by this as well. Very similar story to this and others shared in this thread: Use an S3 bucket for user uploads - and Google then marks the bucket as unsafe. In our case a user had clicked “Save link as...” on a Google Drive file. This saves an HTML file with the Google login page in some cases (since downloading the file requires you to be logged in). The user then proceeded to upload that HTML file. Then it was automatically marked since it looked like we were phishing the Google login page.

It should be noted that Firefox uses the Google banlist as well so switching browsers does not work!

I think another take away from this article is “don’t allow users to upload malicious files that you then host from your domain”

This seems easier to do than jumping domains.

  • > I think another take away from this article is “don’t allow users to upload malicious files to your domain”

    I disagree, at which point did we all accept Google's role as defacto regulator and arbiter of the Internet? Why should we tacitly accept the constraints they deem as appropriate and modify the way we build the web?

    In other words, those are our domains, our apps, our systems and we'll do as we please; that includes worrying about content moderation, or not.

    When and why did we accept google as the Internet's babysitter?

    Apologies if this sounds aggressive, but your takeaway reflects an appalling and quite fatalistic mindset; one which I sadly believe is increasingly common: big corporations knows best, big corporations say and we do, big corporations lead the way.

    On the other hand, probably I'm just biased and tired considering how tiresome it's been to explain to my friends and family why Signal is the better alternative after the WhatsApp/Facebook fiasco.

    /EndRant

  • Pretty sure the main point was a private company can effectively delist you from the internet without any rhyme or reason. Most of us have heard Google horror stories when you use their products the fact you can be free of them and have any new customers bounce from your sight in terror is uh, terrifying.

    I would like to emphasize of course they have good stated reasons for warning users before accessing websites. The issue is that they are a private company whose behavior affects all major browsers and (for kicks) they have an extremely opaque review process.

    If you ran a "divest from Big Tech" website which started gaining steam they could delist like this and the only real force stopping them is public backlash. If you think you can effectively sue Google to stop them I have a bridge to sell you.

  • Author here.

    That is definitely a good idea, and I recommend it. But that should not be the main takeaway.

    In our particular case, that was not found to be the problem (we think it was some sort of false positive), and there are valid reasons for users to do that anyway (upload a phishing email attachment onto an IT support ticket, for example).

  • I think the author highlights the main issue at the end of the article. This is where pressure needs to be applied. I get it, Google’s process probably protects a lot of end users from malicious sites. Getting a real business added to this blocklist by a bot though is not cool. Perhaps a process to whitelist your own domains if this power can’t be wrangled from Google.

    > Google literally controls who can access your website, no matter where and how you operate it. With Chrome having around 70% market share, and both Firefox and Safari using the GSB database to some extent, Google can with a flick of a bit singlehandedly make any site virtually inaccessible on the Internet.

    > This is an extraordinary amount of power, and one that is not suitable for Google's "an AI will review your problem when and if it finds it convenient to do so" approach.

    • > Getting a real business added to this blocklist by a bot though is not cool.

      Real businesses can (and often do) host malware too. There was a notable event where php.net was hacked and hosting malware, which Google flagged. The owner of php.net was pretty mad at first and claimed it was a false positive. It wasn't.

      2 replies →

Being completely blacklisted is very bad, but u know at least that something needs fixing. Imagine if google partially punishes u and downrank you in the search for no reason. This is harder to figure out. It took us several months to discover such a problem until finally we registered to google websmaster tool.

  • What are you talking about? The article said that they didn't change anything, because they found nothing wrong with the site. The ban from google was totally random without any explanation. And it went away without any changes or explanations about what was wrong.

> Proactively claim ownership of all your production domains in Google Search Console.

That's one of the first things you should do, when registering a domain and setting up a website. It takes about 2 minutes. So I wonder a bit why a business of this size would learn doing this through such a crisis.

  • This is sad. When you open a business in the real world, sure you have to tell the authorities about it (because it's the law!). When you open a digital business, you have to tell Google (via Google Search Console) about it... But Google is not the law, not even an authority; it just happens that Google owns google.com and Chrome and that makes Google the de facto Godfather of the internet: if you don't comply, your business is practically dead. Again, sad.

  • Author here. The impacted domain was a Cloudfront CDN subdomain with random characters in it, not company.com (thankfully!). I doubt anyone signs up for Search Console on that type of domain that they don't even really own.

Is there any reason that Google couldn't, or wouldn't, repurpose Google Safe Browsing to blacklist sites that are "unsafe" due to under- or poorly moderated content? E.g. doing this to Parler after they find hosting again? I can't think of a reliable one.

  • There's a very obvious reason not to do that: if you apparently maliciously cry wolf a few times, people won't trust your cries any more, and, for example, other browsers might choose to stop using the Google Safe Browsing list.

    • No, I don't think that's how it would play out.

      1. Google bans parler.com on Jan. 8th by adding it as an "unsafe URL" to their blacklist.

      2. Mozilla issues statement: "While we don't believe it was prudent to use the Safe Browsing blacklist for this purpose, given recent events, we will not be unblocking parler.com, and do not currently deem it necessary to maintain a separate safe browsing list."

      3. Something similar happens a few months from now, and this time there's no statement from Mozilla or Microsoft. It has now become accepted that blacklisting less-moderated social media, which can cause real-world harm, is a normal use for the Safe Browsing list.

      The problem is, if a mainstream browser goes against the flow, it becomes "The Nazi Browser." Its market share was already less than Chrome's, and now it's getting all these new users who are outcasts. This is a Hard Problem of moderation in a small market. You can't be the one out of three players who moderates less, lest you be overwhelmed by undesirables and less-desirables.

      5 replies →

    • Author here. I think it's too late in the cycle for that. This list is too widespread and anyone that is banned from it needs to immediately work around the issue somehow, therefore reducing the visibility of the problems.

    • So what would they use instead? It's not like there are any other free, real-time and mostly accurate malicious-URL databases around for people to plug into their browsers and products.

      4 replies →

    • I would agree, but "apparently maliciously" is too subjective.

      According to US conservatives this is what Twitter, Facebook, Amazon, Google, Apple, Twilio, Snapchat, etc all did to Parler for political reasons.

      According to US progressives/liberals it was absolutely not malicious, but rather the polar opposite: protecting people.

      These days there is no common agreement on that stuff, and given the recent events I see no reason to believe that they wouldn't do as GP asked.

      2 replies →

  • Users would start to ignore the warnings and proceed anyway, or even turn safe browsing off.

So, essentially they let someone host malicious content on their CDN, which led to Google blocking it. I don't see the scandal here. Also, it seems Google fixed the issue within 2 hours, which is quite good TBH.

There are many open-source & commercial IOC lists in distribution from vendors like Crowdstrike, Team CYMRU etc., a lot of them are being fed into SIEM systems, firewalls and proxies at companies. If you happen to end up on one of these lists it can take months or years to clear your reputation.

  • If you're going to comment that they did something wrong, you should consider reading the article and notice that the safe browsing flag didn't mention a URL and the block was removed without any follow-up once they requested the removal.

> losing access to their GMail accounts and their entire digital life.

This is why my email address is @ a domain that I own. Thus, if my hoster goes ventral fin up, I find another hoster. I might lose some time, but I won't lose everything permanently.

My mail reader (Thunderbird) is also configured to always download all new email and delete it from the server. Hence I have backups going back 25 years, which has turned out to be valuable many times. One case was when I was reconstructing the timeline for "History of the D Programming Language" I had a solid resource rather than my barnacle-encrusted memory.

https://dl.acm.org/doi/abs/10.1145/3386323

Its not just startups. I work at a major company and we’ve had internal domains flagged in the past due to internal security testing. We resolved it by making some calls to people at Google because the Safe Browsing dashboard is so slow to fix things.

This is especially troublesome if you allow customers to upload code to run on your systems (e.g. Javascript for webpages or interactive data analytics) You have to isolate every customer on separate domains.

  • > You have to isolate every customer on separate domains.

    Allowing unvetted JavaScript to be served from your main domain is something of a security risk anyway.

  • Do you need a real domain for each customer or is a subdomain sufficient isolation?

    • Real domain. If you have customer1.example.com and customer2.example.com, and customer2.example.com serves malware, all of example.com can be flagged.

      1 reply →

This is not new; such things happened many times in the past (25 years ago Microsoft was the behemoth trampling small companies) and will happen again. I do not think Google is doing it consciously -- this is probably just collateral damage from some bot or rule.

The way to handle it is to reduce dependencies on the cloud. This does not mean cutting cloud services altogether, but once the company is big enough (and the author talks about 1000s SMEs and millions of users), plan for graceful degradation with a fallback to a different provider and another fallback to owned servers.

This takes work and reduces capability during the crunch, but it is often a lot easier and cheaper than people think if planned properly and not in a shotgun style of crisis engineering. My 2c.

  • Author here. The scary bit is that the blacklist is enforced client side in Chrome and other programs. Our servers and systems were running just fine when this happened, but if Google Chrome refuses to open your website, you're still down.

    The closest parallel I can think of are expired SSL certificates, but the level of transparency and decentralization of that system vs. this opaque blacklist is not really on the same league.

    • Some derisking solution may be wrapping your web app as native client. E.g. Electron app is Chrome technically but you get more control over its settings. I know Microsoft (SmartScreen) and Apple may block apps for many reasons too but at least you get more baskets for your eggs.

  • Yeah i read stories that Yahoo in 1990s called itself a media company and it's product managers "producers" out of fear that once you call yourself a software company - Microsoft will crush you...

    As for using clouds - there is absolutely no point in the world to use them for anything above staging level, or very very low level launches. People should switch away from cloud as soon as they see even tentative signs of a product-market fit.

    • You will save so, so much money switching away from clouds too.

      No, you don't need to use a hundred different AWS/GCP/whatever services, and yes, managing your own infrastructure is a lot easier than you think (and sometimes easier/faster than AWS).

      The Stack Exchange network, at least around 2018 or so, was hosted on 12 servers they own!

      1 reply →

  • > I do not think Google is doing it consciously -- this is probably just collateral damage from some bot or rule.

    "Collateral damage" from some bot or rule just means that Google doesn't care enough about the edge cases (which, at Google scale, are particularly harmful): Google consciously decided this when implementing their algorithms.

  • > this is probably just collateral damage from some bot or rule

    The point is, collateral damage and/or false positives are not acceptable for a service with an impact like this. In the real world, we consider them war crimes, etc. Bots and rules are implementations of policies and policies come with responsibility.

One corporation must not have so much power over billions of citizens of many countries. A power like that must only come from a transparent non-profit organization with a publicly elected management board.

We will get to that point sooner or later. But the road there will be long and painful.

  • > We will get to that point sooner or later.

    Is there anything in particular that makes you believe that it'll eventually happen?

    Because personally my outlook on things is a bit more pessimistic - oftentimes the main concerns of individuals and organizations alike are financially-oriented and few share the enthusiasm for transparency and openness like Richard Stallman does.

    The trend of SaaSS ( https://www.gnu.org/philosophy/who-does-that-server-really-s... ) because of companies not wanting to invest time in engineering their own solutions or even using FOSS, alongside with how many of them are handling GDPR and even cookie compliance, with the use of UX "dark paths" (e.g. it being easier to accept advertising cookies rather than deny them) doesn't let me keep a positive outlook on things.

    It feels like we'll all be reliant on the "tech giants" for a variety of things for the decades to come, even "de-Googling" oneself not always being feasible.

    • >Is there anything in particular that makes you believe that it'll eventually happen?

      Humans have demonstrated the ability to eventually improve social systems to make them account for the needs and demands of the majority of stakeholders. In the offline world it has evolved into what is known as democracy. It started several centuries ago and eventually evolved into modern governments as we know it - publicly elected management boards.

      Recently, there was an excellent article [1] on HN. It rightfully compared the current state of internet to the feudal times and warlords common in the offline world many centuries ago. From that point through a long and painful process we've come to elected governments as the most sustainable form of governing a large number of humans. All other forms of government turned out to be unsustainable (no matter how attractive they were to certain individuals or organizations) and inevitably led to all kinds of social catastrophes.

      I believe, the same will eventually happen to the internet, our new brave world we used to love, but now seem to become increasingly disenchanted with.

      [1] https://locusmag.com/2021/01/cory-doctorow-neofeudalism-and-...

I’ve being increasingly wary of Google’s offerings altogether. Their ban hammer seems to be driven by Mr Magoo, who looks at everything and sees threats, and makes judgements.

  • Yes, but an inverted magoo. Mr. Magoo assumed the best intentions of everything he bumped into (and misunderstood).

Can anyone "in the know" objectively comment if Google Safe Browsing (GSB) has had a net positive result or outcome for the Internet, at large?

Has GSB helped users, more than it has hurt them?

The anti-Google rhetoric [on HN] is becoming more tiresome as of late. Personally, I welcome the notifications in my browsers that a domain is unsafe. I can't possibly be the only one.

  • The problem, from HN's perspective, is that false positives on GSB hurt businesses a lot more than they hurt users or the internet at large.

    If I'm a random person browsing the internet at large, and a website I try to visit gets flagged as "possibly malicious", well, I probably didn't need the information or services on that particular website that badly anyway. I can find another website that offers the same information and services easily enough. Meanwhile, if my computer or browser is infected with malware, that's pretty bad for me personally. I could lose money, time and personal data and security. The potential consequences are bad enough that I really shouldn't risk it.

    On the other hand, if my business is blocked by GSB, that is very bad for my business. The customers I don't lose are going to lose confidence in me. Meanwhile, the cost to me if I am accidentally hosting malware is pretty minimal. Even if a large number of my users are harmed by the malware, they're unlikely to be so harmed they stop paying me, and it's pretty hard for to know where you picked up malware, so it's unlikely to be traced back to me. I've never actually heard of a lawsuit from an end-user against the website they downloaded malware from.

    A false negative from GSB is a lot worse for internet users than a false positive; an internet business, on the other hand, would prefer a false negative to a true positive, let alone a false positive.

    Add in that internet business owners (or people highly invested in internet businesses through their jobs) are over-represented on HN, and it's no surprise that HN is not a fan of Google Safe Browsing.

    • > an internet business, on the other hand, would prefer a false negative to a true positive, let alone a false positive.

      [Emphasis mine]

      This is crucial and it's why the sub-threads imagining suing Google aren't going anywhere. Google will very easily convince a judge that what they're doing is beneficial to the general public, because it is, even though some HN contributors hate it because they'd prefer to meet a much lower standard.

      What I'm seeing a lot of in this thread is people saying OK, maybe a burger we sold did have rat droppings in it, but I feel like our kitchen ought to be allowed to stay open unless they buy at least a few hundred burgers and find rat droppings in a statistically significant sample and even then shouldn't I get a few weeks to hire an exterminator? Isn't that fairer to me?

  • I think GSB is great because there is no other product like it, it is very fast to respond to most threats and it can be used for free. The only thing about it that's not great is, in typical fashion, the lack of transparency about some of the processes. Not about how phishing verdicts are created, this should remain a closely guarded secret, but about what actually happens when you send a report or send a review request.

  • Author here. It's not really rhetoric, I wrote the post because it's downright scary that your business of over 10 years can vanish in a puff of smoke because Google didn't bother to require an offending URL field in an internet-wide blacklist. At the level they operate, there needs to be a semblance of due process.

  • What about false positives?

    From the fine article: one Google system was detecting emails coming from another Google system as phishing. This is ridiculous.

    • It's needed to make sure you can not claim bias. For example Google blocking competitors, or unfavourable information.

  • It's hard to argue against "safe". If they would name it "filtered browsing" it might be something arguable, but "safe browsing" who wouldn't want that?

  • If Safe Browsing were offered by some neutral internet organization (e.g., similar to IANA) I wouldn't mind. But it's offered by a private company: so it's naive to think that GSB benefits anyone other than Google itself.

  • I'd guess a large net positive among the general population but maybe neutral for the tech literate like HN readers. Most tech-literate people are careful enough to recognize tactics used by phishing sites and won't click on phishing links, or would click and immediately figure out it's phishing. That cannot be said for the general population.

  • It seems similar to the move from client side spam filters to the server side.

    Spam filtering really didn’t get better with the change (for me), but now it’s orders of magnitude harder to run an email server.

    Taking the article at face value, GSB makes it much harder to run a reliable web site. Has centralization of email into surveillance organizations hurt more than the benefit from saving bandwidth to download spams, and automatically deleting them at the client?

    How much damage will (further) centralization of web hosting onto social network sites (Facebook, Twitter, GitHub, Stack Exchange, etc, etc.) hurt the internet?

    It’s arguably already done more harm than good. I can’t even find a decent recipe that a high end laptop can efficiently display. I used to be able to download cookbooks worth of recipes, and my 386 could load them instantly.

The story he links to, about the "Online Slang Dictionary" being removed from google search because the founder of Urban Dictionary was friends with googlers (true) and (allegedly) used his influence is fascinating:

http://onlineslangdictionary.com/pages/google-panda-penalty/

  • You may want to read these posts:

    https://news.ycombinator.com/item?id=24109168

    • My plans to trickle out details of my conversation with the Google employee were put on hold due to a massive change in my life responsibilities due to the novel coronavirus, but it’s my intention to resume soon.

      As I say on the website, this will culminate in my releasing the MBOX formatted file of the conversation, with full headers.

Eventually, Google will get to the point when regulators will come to gut it and the crowd will be cheering

  • But, left with fewer resources, Google's security might become like the security of smaller companies, and the crowd will be crying.

Are there any no win, no fee law firms that specialize in these cases? What if for every hour offline, your SAAS loses X money? For this particular case, what if due to the service disruption, some customers decide to move their business elsewhere? Enforce an SLA?

  • Author here. That was exactly our situation with the impacted systems. We got lucky with the fast "review" and it happened late enough in the day that only PST customers were impacted meaningully.

    But still, quite frightening, hence the post. It's not a failure mode we had in mind when we established the SLAs.

Stupid question: Isn't this clear-cut grounds for a defamation lawsuit?

Also, is it possible to have a class-action defamation lawsuit?

The fundamental issue that the author gomox is not stating clearly in his article is that there are no consequences to Google for their actions. None. Literally zero.

I don't think the best plan is to wait and hope for a government to step in and take action. Hope is not a strategy.

Complaining on public forums has similarly done nothing to curb Google's careless wielding of the ban-hammer.

So sue them. Cost them money. Punish them in a material way that they can't ignore.

I can't imagine anything else working...

Teach people how to get past the scary warning one way or another, and spread that knowledge far and wide. With enough false positives their blacklist will be diluted to the point of uselessness and hopefully people will also become better educated in the process.

Google will of course do everything in their power to stop that from happening, but every little bit of opposition helps --- from recommending others to not install censorware browsers, to showing them articles like this --- because this is a fight for the freedom for the Internet. As big as Google is, the Internet is far bigger.

For desktop software, antivirus "industry" can be almost equally destructive.

For instance, Avast breaks installers of software made with a specific installation framework: https://github.com/wixtoolset/issues/issues/5593

The problem lasts for years. At one point I've tried to contact them, but people from Avast were either unable or unwilling to fix their software.

Doesn't Safe Browsing require every URL you visit to be sent to G$$gle first? I know Chrome users "have nothing to hide", but this looks like complete surrender.

Yep this happened to me too and I came to exactly the same conclusions.

We have a list of completely separate “API domains” that our scripts talk to and which also host the cloudfront CDN.

We also cohort our customers by Sift score and keep trusted enterprise customers away from endpoints given to new signups. This way if someone new does something sketchy to get you flagged it won’t affect your core paying customers.

Some web hosts use Safe Browsing to automatically perm-ban any sites on the list. I've been banned from Heroku for a couple years at this point because one of my sites got added to Safe Browsing as malware and Heroku's systems just automatically perm-banned me (and to make things worse, in the ban email they tell you to send ban appeals to just bounces).

My idea, which will be ignored as usual, is that the problem is the monopoly.

The reason we have a monopoly is because the web browser is now a full operating system that is so complicated that no group can replicate it.

Start over with a new protocol. Make it content-centric, i.e. distributed protocols with no central servers. Support download-limited lightweight markdown sites for information sharing.

Then for applications and interactive content, add a canvas-light graphics system to web assembly. Again, I suggest limiting download size to keep things snappy. And make sure not to load any applications automatically or put them in the same process as the markdown browser.

If you do it right, you will have a common protocol that is straightforward enough that there can actually be several implementations. And it won't be controlled by one company.

If customers using google incurs a tax upon business regardless of whether the business does business voluntarily with google why not work on changing that.

Start with a snazzy our service works better in firefox. Eventually offer trivial new features in firefox but not chrome terminating with a small discount for using firefox. Over time small price increases can render the discounted price the same as the current price and effectively you are charging your users for using a vendor which costs you to do business with.

Google views chrome as a moat around their business keeping other vendors from cutting them off from the revenue stream that powers their entire business. Attack the moat and you might see movement to make your life easier.

It is quite good Google cares about users. But it does not care about website owners. There is one and only reason. For Google WWW is a competition for Google Play marketplace.

Literally open internet is a competition for Google. That is why the company has no problem to issue domain wide ban, without informing website owner, without any explanation and with showing a scary message to website users to make them go away.

Author of the blog post seems to believe it is an AI action. But what I can see his company was hit with some serious damage due to a company that, I assume, has some competing apps on its Google Play platform.

I can believe AI can be the cause, but it should be a court to decide if there is no collusion and who should pay for the damage.

This is an area where regulatory action should be taken against Google. Google needs to implement a process with manual review in a reasonable timeframe, or they should be broken up for having monopolistic power over which sites are on the internet.

I wouldn't be surprised if this was done just in order to associate somebody with something interesting Google sees on the Internet and has no ownership information about so "that they know". Benefit of the doubt is already gone.

Can Google be held legally accountable for this behavior? Seems like they are hurting businesses by spreading false information. With their market power there need to be some incentive for them to react quicker and with human oversight.

This reminds me of ugliest.app - there was a hn post on it a while ago. And then suprise, suprise, someone made a "paypal" login page which was hosted on the main domain. It was put on the blacklist, not sure if it still is.

I'll tell you a mini story about a coffee shop I visited few days ago. That place was hidden in yelp search when I looked for 'coffee & tea' in my area (their yelp page existed). While I don't know the actual reason why this happened, I immediately discovered that coffee shop using google (as a double check). It gave me a charm because it reminded me a fact that if you have the 'right service', people will find out. Given this flow, I started to believe gatekeepers might begin losing their odds.

It seems like the FTC should be running this for US based customers and browsers should default to a local resource and/or let users override the default source of truth.

  • Cool, then we can complain about false positives at the FTC instead of at Google!

    IMHO, it doesn't really matter who runs it, so long as they're not actively working in bad faith. False positives are a fact of life, garaunteed so long as we have an adversarial malware ecosystem. (For example, the fixes for bad decisions are pretty much indistinguishable from bad actors evading correct decisions.)

    The other side of the coin is a web that looks like my missed calls list - everything is assumed to be spam and malware infested until proven otherwise. No one will use your startup anyway, because any given site is probably terrible. The whitelist becomes a thing that people maintain in their heads, and, again, you get a massive incumbent advantage.

    The right balance is somewhere in-between, and involves fine tuning the false positive rate. The false positives are always going to be unhappy, and hard to tell apart from true positives trying to keep their scam going.

I have to add that firefox seems to be using the same logic/data for their safe browsing featureand will happily flag sites as malicious with no human oversight.

Before even imagining all the ways to start regulating a tech company, I think we desperately need a few basic regulations like:

- For every major service offered, company must provide 3 ways to contact live support, two of which must be immediate, e.g. chat, phone, E-mail. [As opposed to today’s “standard” of having none of these!]

- Every action that can be taken automatically by an AI must be possible for support staff to immediately reverse.

If algorithms they own are operating on a list they maintain and they are making you lose profit, exactly why can you not sue them for that lost profis? What's the legal theory here? A product they own and is entirely disconnected from you is banning you. This is not and should not be OK, nor should you be required to do any special dances and magic gestures to try and mitigate the problem.

The mitigations suggested are easier said than done. In particular, domains can't share cookies which means switching domains likely means logging out any users that are logged, and losing any local settings. Likewise splitting your site between different domains makes it much more difficult to share state (such as whether you are logged in) between the sites.

Add to the list of preventative measures:

- Establish a Twitter account for anything dev ops related.

Don't assume you'll have the ability to communicate via your internal infrastructure. It also helps customers to know there is a 3rd party medium for staying informed and getting in touch.

Knowing that such things exist, while minor, is good marketing fodder as well. It walks the comms are important talk.

As much as I like to give Google a hard time, this isn' really Google's fault. Always use your own URL's for everything. Also, why would you allow customers to upload files and then make them available? Unless you are dropbox or similar, that's bad configuration.

This really sounds like "We made some configuration mistakes and now blame Google"

Maybe there should be a law that any business that has over ten billion dollars in annual revenues has to answer the phone when you call them and have a reasonable resolution process for complaints.

If that ruins your business model, cool. Just spin off parts of the business until each one is back under ten billion in revenue and do whatever you want.

  • That’s not a bad idea. In general I am believing more and more that businesses that exceed a certain size are harmful for the overall economy. They may be more efficient and generate lower customer prices but they also harm innovation and prevent smaller companies from succeeding.

  • Ideally, we would be able to choose the best company, such as the one that does answer their phones. In this case we can't, which is the real problem.

Am I missing something? Is there ever a reason to expose a CloudFront url to the end user instead of using a custom domain?

  • Is there a problem with doing it? I don't see how that would have helped in this case (if anything, it might have made things worse if Google decided to ban the 1st level domain, which they certainly won't do for Cloudfront.net).

    • It just seems less professional. It’s much like having a .blogger.com or .substack domain.

      We have been trained for decades not to trust random domains. To the uninitiated, a CloudFront domain is random.

      I know I’m taken a little aback anytime I go to Amazon’s credit card site - https://amazon.syf.com/login/ it looks like a phishing site.

      1 reply →

Great, so legitimate businesses need to implement tactics commonly used by c2c and malware to operate successfully

Well, as long as you are spending 6 or 7 figures a year on advertising with Google, you'll have a account rep at Google that you can always reach out to. Your ad spending level works as Google's filter for which websites on the internet that they actually give any care about not killing.

  • We spend a nice buck on Google Ads but the impact of getting your SLA-sensitive SaaS app blocked from the Internet is not compatible with reaching out to "someone who might know someone" at a 100K employee company.

There's an effective monopoly on web browsing, and then any private decision here becomes de facto censorship. How can this be constitutional, ants need to rise and get some rulings down on this topic, the web needs to be brought back to how it was.

Seems like a good case for a strict content security policies and self hosting static assets.

I'm always surprised by the gall of Google and other companies that decide for others if websites are suspicious. I'm always sure to disable all those garbage warnings, together with email spam "features".

For a SaaS, CDN's are of limited utility as you have many returning visitors who have cached these assets already. Of course, YMMV, but for us, it was easier to host almost all static assets locally.

Isn't this way to get hurt by a Google's bot a brand new discovery as of 2008 or so? And the bottom line of "letting users upload things is dangerous" is no newer?

We all let it come to this. We are all lazy as f and only care about convenience and short term benefit.

That is why we have the big 5 now that basically are too powerful now to turn away from.

How long until antivirus and safe browsing start marking websites that are "hate sites" as harmful and start, essentially, censoring the internet?

isn't the problem here keeping the cloudfront hostname, vs. setting up a CNAME from your own domain to point at the distribution?

  • Not really, we own the entire Cloudfront subdomain, and Google is wise enough to not ban cloudfront.net entirely (now that would be an interesting day on the internet!).

    Having a CNAME in front wouldn't have made any difference.

Anyone knows what happens if you include resource from a banned domain? Is the resource blocked, or will the user get red screen too?

root.cern was affected by this in the fall, apparently due to a false positive in the windows installer. It was resolved relatively quickly (a day or so?) but hugely inconvenient for e.g. documentation, and of course the particle physics community has connections. root.cern.ch worked but the internal links were all over the place.

Thank you for sharing this. I wonder if having a ton of subdomains might also flag Google to blacklist the parent domain...

"...And that's reason number 3955430, ladies and gentlemen, why monopolies are bad and MUST be dismantled."

Is this not libelous? If the site is neither deceptive nor malware-hosting, and Google are telling people that it is?

BTW, did using another giant's (Amazon) services (like Cloudflare) made the problem better or worse?

> A lot of the cases of blacklisting that I found while researching this issue were caused by SaaS customers unknowingly uploading malicious files onto servers.

This is terrifying - what business is it of Google’s what party A uploads to MY servers? And how are they getting that information without dramatically violating the privacy of their users?

  • If party A uploads something to your servers and the stuff isn't publicly accessible, Google doesn't do anything about it. But if that content is accessible by the public, Google feels a need to protect the public.

Soon enough this will be used to block other kinds of "unsafe" sites containing dangerous things like "hate speech".

posted on medium which decided to paywall after years of being publicly available.

  • Author here - I haven't signed up for Medium's "pay the author" thing, which I think should make my content free to read and paywall free, is that not the case for you?

A bit of deception on how their site ended up on the block list. They strangely block out a part of their response, but we can see "was cleared", which sounds a lot like "the malware some nefarious agent put on my site was removed".

How sites end up on the block list-

-they host malware, either intentionally or because they were hacked.

-they host a phishing site, either intentionally or because they were hacked.

Protecting users is a monumentally more critical task than your concerns.

And this system is incredibly valuable. When I get a text to a phishing site, I immediately report it to the safe browsing list. I also notify the nameserver, the hosting agent, and if applicable the SSL cert provider. Bit.ly if in the chain, though they never do anything [fun fact, even -- phishers and malware authors love putting bit.ly in the chain because they're paying subscribers, and as domains are taken down they can just change the destination. Bit.ly exists on the backs of scumbags, and itself should be on the safe browsing exclusion list]

Usually the safe browsing list addition happens within an hour, saving many people from being exploited. The nameserver and host -- DAYS. Namecheap takes an eternity to do anything, even for outrageously blatant phishing sites. GoDaddy - an eternity. SSL providers seem to act quickly, but propagation delays makes that negligible.

EDIT: 11 days ago I reported the scn- prefixed netflix.com to all of the above. This is a blatant phishing site, and was mass texted to Canadians. It was blacklisted by safe browsing within an hour, likely saving a lot of people grief.

Namecheap, who I informed by their email and by their garbage ticket system, still host the nameserver and physical hosting for this site. 11 days later. Grossly negligent behavior, and there needs to be some window of responsiveness because these players are just grotesque at this point.

  • Author here. I blocked the message in the screenshot because I narrated the first incident, but took screenshots during the second one, so the redacted part was referencing the first one in which, as described, our domain was cleared without actually doing anything.

    Protecting end users from nothing at all (like I said, there is no offending URL) is not more important than making sure Google doesn't literally gatekeep the entire Internet, IMO.

    • I guess. Odds are that there was something, and you have every reason to state otherwise. You're really focused on the URL, but a whole domain will be tagged when random queries are met with content dispositions with malware, which can be automatically flagged by the search engine.

      As an aside, your commentary about Google alerting to phishing emails seems like you're misunderstanding and trying to use this to further your "it's all random!" claims. They aren't flagging it because of the sender, but instead because the contents included a URL on the blacklist. Google re-scans and when they find URLs that are now blacklisted, they warn about Phishing. This isn't new and they've done it for years, and it seems pretty obvious and logical.

      e.g. "That email you got a while back that claimed it's from the Netflix billing problem website is actually phishing. If you gave them details, that's a problem".

      "Protecting end users from nothing at all (like I said, there is no offending URL) is not more important than making sure Google doesn't literally gatekeep the entire Internet"

      This system protects countless people from malware and phishing daily. I have no reason to believe your particular claims about this (though I'm skeptical given that you are blocking details that would allow others -- such as Google -- to repudiate your claims. Why block the subdomain? If it hosts static resources, what's the concern?).

      4 replies →

  • > When I get a text to a phishing site, I immediately report it to the safe browsing list.

    Please, don't do that. You're just giving more power to a private company (Google). It's so deceiving, I know: reporting/blocking malware sites is a good thing, but doing so via Google diminishes the returns so greatly that it's no longer worth it.

    • As opposed to what alternative? Google's safe browsing list is used by everyone, and is currently the gold standard. There exists no alternative. NextDNS uses it. Safari uses it. Firefox uses it.

      Yeah, I'm not feeling guilty about this, and I'll do it every time.

      Note that the list isn't like a spam list or something where bad actors can just flag something and get them blacklisted. When you report to the safe browsing list it is actually verified, and when it's a fake bank/netflix/Amazon/etc login, it's pretty easy for them.

      1 reply →

He talks about Google welding too much power with this. Another example is how their spam filtering can pretty prevent a business from being able to relay emails to any of their customers with a Gmail address. This has led to many people just outsourcing their mail relaying to companies like SendGrid, to lessen the chance of having their emails blocked by Google.

Most of Google's "safety" features are somewhat evil in some way. I don't want any of them, but some of them can't be disabled (like the one that can lock you out of your account even if you have the correct password).

  • Which one is that.

    • Sometime Google doesn't recognize your device and then your password is not enough... even if you have second-factor authentication disabled. So if you don't have a second form of contact like another phone number or another email for recovery, then you are fucked. Sometime they even ask you for a previous password for recovery, so if you use a password manager that doesn't keep history, you might also be fucked.

      2 replies →

Why can't companies like google just have a warning and review period before taking actions like this?

  • It's absurd. This thing happens on the play store. I've seen it happen multiple times due to pure mistakes. It takes an appeal and time to resolve the issue, in the meantime you are stuck.

    Their appeals form only lets you submit 1,000 characters, no images or attachments. So in many cases, it's hard to even provide proof of the mistake. For example, if they falsely takedown your app for trademark infringement, but you have priority rights in a country or a registered mark, how are you supposed to effectively prove that in 1,000 characters with no images? In one case, we had a decision from the trademark office in our favor, but we were unable to attach it in any way and had to try and summarize it in like 300 characters.

    There is no reason in most cases to not provide a warning period and the opportunity to provide evidence and exhibits.

    They act so much like a monopoly in this case that they are stupidly making things harder for themselves. Sundar and Google's legal team should take all the PMs aside and tell them they are going to start losing antitrust cases left and right if they can't provide more due process for decisions.

  • I have no extra knowledge on the subject, but if the flagged website was indeed serving malicious content, the brakes would have to come down pretty hard. If you have a review period you can end up serving malware to hundreds/thousands of people. Don't know how often this happens, though, and what the false positive rate is, it'd be interesting to see.

  • Reviews would have to be done by humans and humans doing things themselves is bad for the bottom line.

  • They don't even validate that blacklist entries actually contain an offending URL in the report. That's how much they care.

  • Because malware sites are practically ephemeral, pop up and disappear on short time frames. A review period wouldn’t do much except let them game the system even better.

  • That would probably cut a lot into their profits. Automating these tasks even if some people get cancelled wrongly is way cheaper than hiring people for reviews. Hey are so big that losing a few customers doesn’t mean much to them.

    I am waiting for the day when this happens to a large company. My company has more and more stuff on AWS. If Amazon cuts us off by accident the damage will quickly go into the billions.

  • 1) Google doesn't want any humans in the loop. Humans are expensive. Would sending a warning first result in more humans involved or less? More. So not gonna happen.

    2) Google claims any information given to exploiters of its rules and systems aids the next attempt. So they don't like to give out any information about what AI rule you tripped to get banned.

This happens when the ticket for braking anti-monopoly laws is magnitudes cheaper than the profit you rake in breaking it.

Wow. I wonder how lng it will be before the Big Tech oligarchy will start blocking websites for “misinformation”.

Insane world we’re heading towards.

Wait until this is also applied to a list of domains from the SPLC and other groups to further censor “hate speech” on the internet.

Imagine a future where multiple big tech companies share “blacklists” of individuals and applications that should be banned across their networks. Your entire business and digital life could be snuffed out in an instant. Already seen it happen, now it just has to scale.

I wonder if a blockchain/bittorrent decentralized option could exist to replace google.

most people don't have billions lying around to compete, but you could reward people who rented out space for the indexing data, and have advertisements baked in that could maybe still use some retargeting but without tracking any personally identifiable data about a person.

Nodes could double as ai/cpu processing for algorithms related to search and storage. Computation and storage amounts could have their own payout per action, or per time on storage.

Most people have their computers on all the time anyways, so if they're working in the background for them to earn some side income, while helping create a better internet.

Would need some centralization I'd imagine though, I think the problem with de-centralization is the goal is ALL or nothing.

Like one or two big servers that maybe tie everything to the rest, and push 'updates' on algorithms, contracts,etc... to end users. Maybe a segregation index, knowing all airplane related searches are indexed on cluster c which has nodes 1-8, so you know where to go to get the info being searched.

I'm a mainly full-stack but 'dumb' developer, not an algorithms wiz, mostly focused on crud apps. But this would be fun to build.