← Back to context

Comment by mcguire

5 years ago

Of particular note:

"Don't host any customer generated data in your main domains. A lot of the cases of blacklisting that I found while researching this issue were caused by SaaS customers unknowingly uploading malicious files onto servers. Those files are harmless to the systems themselves, but their very existence can cause the whole domain to be blacklisted. Anything that your users upload onto your apps should be hosted outside your main domains. For example: use companyusercontent.com to store files uploaded by customers."

Pardon my ignorance as I have few years of web dev experience. What exactly does it mean to store data on a domain? Does he mean serve data via a domain URL? And if so, how does Google have discovery of that data?

  • Author here. Yes, "serve" is the correct interpretation. It is not clear how Google gets ahold of offending URLs within blacklisted domains (like the article says, there were no offending URLs provided to us).

    Theories:

    * Obtained from users of Google Chrome that load specific URLs in their browsers

    * Obtained from scanning GMail emails that contain links to URLs

    * Obtained from third parties that report these URLs

    • The main way is via the Googlebot crawler.

      They also use user reports from Chrome, and links in "mark phishing" emails from Gmail. Those latter two cases the URL is considered private data, so won't be reported in webmaster tools.

      3 replies →

  • We’re pretty sure they get reports from Chrome. A security researcher at my workplace was running an exploit against a dev instance as part of their secops role and got the domain flagged, despite the site being an isolated and firewalled instance not accessible to the internet.

    • Yes, I have noticed that creating a brand new dev domain with crawler blocking norobots file, it is not found on any search on Google, until I open the dev url in Chrome, then bam! watch as their crawler starts trying to search through the site just from opening the url in Chrome.

      This is why I never use Chrome. They scrape the Google Safe Browsing sent from chrome browsers and just do not care about privacy.

      3 replies →

    • But that means they can't verify it, right? Couldn't a malicious actor use this to attack their competitors?

      Add an internal DNS entry for your competitor's domain, spin up an internal server hosting some malware and open it from chrome.

  • We use a fair number of google products, and you can turn on a lot of enhanced protection, and many businesses do. This means even password protected / private URLs may generate scans from what I've seen. I'm not sure how they actually fingerprint files (maybe locally) but it seems pretty broad

    This seems to work across a lot of google products (gmail, drive, chome etc) so it scoops up a ton.

    More here:

    https://security.googleblog.com/2020/05/enhanced-safe-browsi...

    Not sure if this is related to safe browsing. We also can turn on more scanning and other features of all email users.

    The key though, if you allow users to PUT files onto your S3 (even private / signed in) then google may scan them. That means if your user uploads a suspicious file to a trouble ticket system, if there IS a virus in there and google sees it, wham. Obviously most folks will segregate those uploads off into their own s3 bucket by user/account to avoid contamination, but you really have to be careful not to hose viruses AT ALL on your key domains.

I imagine your service still won't have a great time when Google blacklists companyusercontent.com

A proper mitigation would be to serve user data from one domain per user, no?