Comment by arccy

4 months ago

If you're going to host user content on subdomains, then you should probably have your site on the Public Suffix List https://publicsuffix.org/list/ . That should eventually make its way into various services so they know that a tainted subdomain doesn't taint the entire site....

204 comments

arccy

0xbadcafebee 4 months ago

  In the past, browsers used an algorithm which only denied setting wide-ranging cookies for top-level domains with no dots (e.g. com or org). However, this did not work for top-level domains where only third-level registrations are allowed (e.g. co.uk). In these cases, websites could set a cookie for .co.uk which would be passed onto every website registered under co.uk.

  Since there was and remains no algorithmic method of finding the highest level at which a domain may be registered for a particular top-level domain (the policies differ with each registry), the only method is to create a list. This is the aim of the Public Suffix List.
  
  (https://publicsuffix.org/learn/)

So, once they realized web browsers are all inherently flawed, their solution was to maintain a static list of websites.

God I hate the web. The engineering equivalent of a car made of duct tape.

KronisLV 4 months ago
> Since there was and remains no algorithmic method of finding the highest level at which a domain may be registered for a particular top-level domain
A centralized list like this not just for domains as a whole (e.g. co.uk) but also specific sites (e.g. s3-object-lambda.eu-west-1.amazonaws.com) is both kind of crazy in that the list will bloat a lot over the years, as well as a security risk for any platform that needs this functionality but would prefer not to leak any details publicly.
We already have the concept of a .well-known directory that you can use, when talking to a specific site. Similarly, we know how you can nest subdomains, like c.b.a.x, and it's more or less certain that you can't create a subdomain b without the involvement of a, so it should be possible to walk the chain.
Example:
c --> https://b.a.x/.well-known/public-suffix b --> https://a.x/.well-known/public-suffix a --> https://x/.well-known/public-suffix
Maybe ship the domains with the browsers and such and leave generic sites like AWS or whatever to describe things themselves. Hell, maybe that could also have been a TXT record in DNS as well.
- IanCal 4 months ago
  
  > any platform that needs this functionality but would prefer not to leak any details publicly.
  I’m not sure how you’d have this - it’s for the public facing side of user hosted content, surely that must be public?
  > We already have the concept of a .well-known directory that you can use, when talking to a specific site.
  But the point is to help identify dangerous sites, by definition you can’t just let the sites mark themselves as trustworthy and rotate around subdomains. If you have an approach that doesn’t have to trust the site, you also don’t need any definition at the top level you could just infer it.
  
  5 replies →
- shadowgovt 4 months ago
  
  It does smell very much like a feature that is currently implemented as a text file but will eventually need to grow to its own protocol, like, indeed, the hostfile becoming DNS.
  One key difference between this list and standard DNS (at least as I understand it; maybe they added an extension to DNS I haven't seen) is the list requires independent attestation. You can't trust `foo.com` to just list its subdomains; that would be a trivial attack vector for a malware distributor to say "Oh hey, yeah, trustme.com is a public suffix; you shouldn't treat its subdomains as the same thing" and then spin up malware1.trustme.com, malware2.trustme.com, etc. Domain owners can't be the sole arbiter of whether their domain counts as a "public suffix" from the point of view of user safety.
- m0dest 4 months ago
  
  It looks like Mozilla does use DNS to verify requests to join the list, at least.
  $ dig +short txt _psl.website.one @1.1.1.1 "https://github.com/publicsuffix/list/pull/2625"
  Doing this DNS in the browser in real-time would be a performance challenge, though. PSL affects the scope of cookies (github.io is on the PSL, so a.github.io can't set a cookie that b.github.io can read). So the relevant PSL needs to be known before the first HTTP response comes back.
- IshKebab 4 months ago
  
  I presume it has to be a curated list otherwise spammers would use it to evade blocks. Otherwise why not just use DNS?
  
  3 replies →
lucideer 4 months ago

> God I hate the web
This is mostly a browser security mistake but also partly a product of ICANN policy & the design of the domain system, so it's not just the web.
Also, the list isn't really that long, compared to, say, certificate transparency logs; now that's a truly mad solution.
modeless 4 months ago
Show me a platform not made out of duct tape and I'll show you a platform nobody uses.
- vincnetas 4 months ago
  
  regular cars?
  
  15 replies →
- bell-cot 4 months ago
  
  Admitting I'm old, but my HP-11C still gets pretty-regular use.
  And judging by eBay prices, or the SwissMicros product line, I suspect I have plenty of company.
lukan 4 months ago
"The engineering equivalent of a car made of duct tape"
Kind of. But do you have a better proposition?
- jadengeller 4 months ago
  
  I'd probably say we ought to use DNS.
  
  2 replies →
- 0xbadcafebee 4 months ago
  
  Cookies shouldn't be tied to domains at all, it's a kludge. They should be tied to cryptographic keypairs (client + server). If the web server needs a cookie, it should request one (in its reply to the client's first request for a given url; the client can submit again to "reply" to this "request"). The client can decide whether it wants to hand over cookie data, and can withhold it from servers that use different or invalid keys. The client can also sign the response. This solves many different security concerns, privacy concerns, and also eliminates the dependency on specific domain names.
  I just came up with that in 2 minutes, so it might not be perfect, but you can see how with a little bit of work there's much better solutions than "I check for not-evil domain in list!"
  
  4 replies →
- gmueckl 4 months ago
  
  A part of the issue is IMO that browsers have become ridiculously bloated everything-programs. You could take about 90% of that out and into dedicated tools and end up with something vastly saner and safer and not a lot less capable for all practical purposes. Instead, we collectively are OK with frosting this atrocious layer cake that is today's web with multiple flavors of security measures of sometimes questionable utility.
  End of random rant.
  
  39 replies →
- Groxx 4 months ago
  
  I'm under the impression that CORS largely solves it?
  which is still much too new to be able to shut down the PSL of course. but maybe in 2050.
  
  1 reply →
jacquesm 4 months ago

I think we lost the web somewhere between PageRank and JavaScript. Up to there it was just linked documents and it was mostly fine.
formerly_proven 4 months ago

Why is it a centrally maintained list of domains, when there is a whole extensible system for attaching metadata to domain names?
a456463 4 months ago

I love the web. It's the corporate capitalistic ad fueled and govt censorship web that is the problem.
vladms 4 months ago
> God I hate the web. The engineering equivalent of a car made of duct tape.
Most of the complex thing I have seen being made (or contributed to) needed duct tape sooner or later. Engineering is the art of trade-offs, of adapting to changing requirements (that can appear due to uncontrollable events external to the project), technology and costs.
Related, this is how the first long distance automobile trip was done: https://en.wikipedia.org/wiki/Bertha_Benz#First_cross-countr... . Seems to me it had quite some duct tape.
- szszrk 4 months ago
  
  Why would you compare Web to that? A first fax message would be more appropriate comparison.
  Web is not a new thing and hardly a technical experiment of a few people any more.
  If you add the time since announcing the concept of Web to that trip date, you have a very decent established industry already. With many sport and mass production designs:
  https://en.wikipedia.org/wiki/Category:Cars_introduced_in_19...
  
  1 reply →
starfallg 4 months ago

That's the nature of decentralised control. It's not just DNS, phone numbers work in the same way.
ApolloFortyNine 4 months ago
All web encryption is backed by static list of root certs each browser maintains.
Idk any other way to solve it for the general public (ideally each user would probably pick what root certs they trust), but it does seem crazy.
- account42 4 months ago
  
  We already have a solution to solve it: DNS-based Authentication of Named Entities (DANE)
  This solution is even more obvious today where most certificates are just DNS lookups with extra steps.
thomasjb 4 months ago

What we need is a web made in a similar way to the wicker-bodied cars of yesteryear
jrochkind1 4 months ago

I'm not sure I'm following what inherent flaw you are suggesting browsers had that the public suffix list originators knew they had.
samlinnfer 4 months ago

Wait until you learn about the HSTS preload list.

CaptainOfCoit 4 months ago

I think it's somewhat tribal webdev knowledge that if you host user generated content you need to be on the PSL otherwise you'll eventually end up where Immich is now.

I'm not sure how people not already having hit this very issue before is supposed to know about it beforehand though, one of those things that you don't really come across until you're hit by it.

hu3 4 months ago
This is the first time I hear about https://publicsuffix.org
- btown 4 months ago
  
  You're in good company! From 12 days ago: https://news.ycombinator.com/item?id=45538760
no_wizard 4 months ago
I’ve been doing this for at least 15 years and it’s the first I heard of this.
Fun learning new things so often but I never once heard of the public suffix list.
That said, I do know the other best practices mentioned elsewhere
- foobarian 4 months ago
  
  First rule of the public suffix list...
  
  4 replies →
nickjj 4 months ago
Besides user uploaded content it's pretty easy to accidentally destroy the reputation of your main domain with subdomains.
For example:
1. Add a subdomain to test something out 2. Complete your test and remove the subdomain from your site 3. Forget to remove the DNS entry and now your A record points to an IP address
At this point if someone else on that hosting provider gets that IP address assigned, your subdomain is now hosting their content.
I had this happen to me once with PDF books being served through a subdomain on my site. Of course it's my mistake for not removing the A record (I forgot) but I'll never make that mistake again.
10 years of my domain having a good history may have gotten tainted in an unrepairable way. I don't get warnings visiting my site but traffic has slowly gotten worse over time since around that time, despite me posting more and more content. The correlation isn't guaranteed, especially with AI taking away so much traffic but it's something I do think about.
bo0tzz 4 months ago
The Immich domains that are hit by this issue are -not- user generated content.
- CaptainOfCoit 4 months ago
  
  They clearly are? It seems like GitHub users submitting a PR could/can add a `preview` label, and that would lead to the application + their changes to be deployed to a public URL under "*.immich.cloud". So they're hosted content generated by users (built application based on user patches) on domains under their control.
  
  2 replies →
fn-mote 4 months ago

Clearly they are not reading HN enough. It hasn’t even been two weeks since this issue last hit the front page.
I wish this comment were top ranked so it would be clear immediately from the comments what the root issue was.
cyberes 4 months ago

[flagged]
tonyhart7 4 months ago
so its skill issue ??? or just google being bad????
- yndoendo 4 months ago
  
  I will go with Google being bad / evil for 500.
  Google 90s to 2010 is nothings like Google 2025. There is a reason they removed "Don't be evil" ... being evil and authoritarian makes more money.
  Looking at you Manifest V2 ... pour one out for your homies.
  
  10 replies →

thayne 4 months ago

Looking through some of the links in this post, I there are actually two separate issues here:

1. Immich hosts user content on their domain. And should thus be on the public suffic list.

2. When users host an open source self hosted project like immich, jellyfin, etc. on their own domain it gets flagged as phishing because it looks an awful lot like the publicly hosted version, but it's on a different domain, and possibly a domain that might look suspicious to someone unfamiliar with the project, because it includes the name of the software in the domain. Something like immich.example.com.

The first one is fairly straightforward to deal with, if you know about the public suffix list. I don't know of a good solution for the second though.

smaudet 4 months ago
I don't think the Internet should be run by being on special lists (other than like, a globally run registry of domain names)...
I get that SPAM, etc., are an issue, but, like f* google-chrome, I want to browse the web, not some carefully curated list of sites some giant tech company has chosen.
A) you shouldn't be using google-chrome at all B) Firefox should definitely not be using that list either C) if you are going to have a "safe sites" list, that should definitely be a non-profit running that, not an automated robot working for a large probably-evil company...
- lucideer 4 months ago
  
  > I don't think the Internet should be run by being on special lists
  People are reacting as if this list is some kind of overbearing way of tracking what people do on the web - it's almost the opposite of that. It's worth clarifying this is just a suffix list for user-hosted content. It's neither a list of user-hosted domains nor a list of safe websites generally - it's just suffixes for a very small specific use-case: a company providing subdomains. You can think of this as a registry of domain sub-letters.
  For instance:
  - GitHub.io is on the list but GitHub.com is not - GitHub.com is still considered safe
  - I self-host an immich instance on my own domain name - my immich instance isn't flagged & I don't need to add anything to the list because I fully own the domain.
  The specific instance is just for Immich themselves who fully own "immich.cloud" but sublet subdomains under it to users.
  > *if you are going to have a "safe sites" list"
  This is not a safe sites list! This is not even a sites list at all - suffixes are not sites. This also isn't even a "safe" list - in fact it's really a "dangerous" list for browsers & various tooling to effectively segregate security & privacy contexts.
  Google is flagging the Immich domain not because it's missing from the safe list but because it has legitimate dangers & it's missing from the dangerous list that informs web clients of said dangers so they can handle them appropriately.
- thayne 4 months ago
  
  Firefox and Safari also use the list. At least by default, I think you can turn it off in firefox. And on the whole, I think it is valuable to have _a_ list of known-unsafe sites. And note that Safe Browsing is a blocklist, not an allowlist.
  The problem is that at least some of the people maintaining this list seem to be a little trigger happy. And I definitely thing Google probably isn't the best custodian of such a list, as they have obvious conflicts of interest.
  
  3 replies →
- pjc50 4 months ago
  
  It always has been run on special lists.
  I've coined the phrase "Postel decentralization" to refer to things where people expect there to be some distributed consensus mechanism but it turned out that the design of the internet was to email Jon Postel (https://en.wikipedia.org/wiki/Jon_Postel) to get your name on a list. e.g. how IANA was originally created.
- awesome_dude 4 months ago
  
  Oh god, you reminded me the horrors of hosting my own mailserver and all of the white/blacklist BS you have to worry about being a small operator (it's SUPER easy to end up on the blacklists, and is SUPER hard to get onto whitelists)
- shadowgovt 4 months ago
  
  There are other browsers if you want to browse the web with the blinders off.
  It's browser beware when you do, but you can do it.
- jonas21 4 months ago
  
  You can turn it off in Chrome settings if you want.
- knowriju 4 months ago
  
  If you have such strong feelings, you could always use vanilla chromium.
lucideer 4 months ago

> I don't know of a good solution for the second though.
I know the second issue can be a legitimate problem but I feel like the first issue is the primary problem here & the "solution" to the second issue is a remedy that's worse than the disease.
The public suffix list is a great system (despite getting serious backlash here in HN comments, mainly from people who have jumped to wildly exaggerated conclusions about what it is). Beyond that though, flagging domains for phishing for having duplicate content smells like an anti-self-host policy: sure there's phishers making clone sites, but the vast majority of sites flagged are going to be legit unless you employ a more targeted heuristic, but doing so isn't incentivised by Google's (or most company's) business model.
VTimofeenko 4 months ago

> When users host an open source self hosted project like immich, jellyfin, etc. on their own domain...
I was just deploying your_spotify and gave it your-spotify.<my services domain> and there was a warning in the logs that talked about thud, linking the issue:
https://github.com/Yooooomi/your_spotify/issues/271
liqilin1567 4 months ago
That means the Safe Browsing abuse could be weaponized against self-hosted services, oh my...
- sschueller 4 months ago
  
  New directive from the Whitehouse. Block all non approved sites. If you don't do it we will block your merger etc...
  
  1 reply →
fuzzy2 4 months ago

The second is a real problem even with completely unique applications. If they have UI portions that have lookalikes, you will get flagged. At work, I created an application with a sign-in popup. Because it's for internal use only, the form in the popup is very basic, just username and password and a button. Safe Browsing continues to block this application to this day, despite multiple appeals.
asddubs 4 months ago

Even the first one only works if there's no need to have site-wide user authentication on the domain, because you can't have a domain cookie accessible from subdomains anymore otherwise.

david_van_loon 4 months ago

The issue isn't the user-hosted content - I'm running a release build of Immich on my own server and Google flagged my entire domain.

mixologic 4 months ago
Is it on your own domain?
- david_van_loon 4 months ago
  
  Yes, my own domain.
goxofy 4 months ago

[dead]
crtasm 4 months ago
Is the subdomain named immich or something more general?
- david_van_loon 4 months ago
  
  The subdomain is "immich", which has crossed my mind as a potential flagging characteristic.
  
  2 replies →

827a 4 months ago

They aren't hosting user content; it was their pull request preview domains that was triggering it.

This is very clearly just bad code from Google.

antonvs 4 months ago

Or anticompetitive behavior.

aftbit 4 months ago

I thought this story would be about some malicious PR that convinced their CI to build a page featuring phishing, malware, porn, etc. It looks like Google is simply flagging their legit, self-created Preview builds as being phishing, and banning the entire domain. Getting immich.cloud on the PSL is probably the right thing to do for other reasons, and may decrease the blast radius here.

LennyHenrysNuts 4 months ago

The root cause is bad behaviour by google. This is merely a workaround.

bitpush 4 months ago
[flagged]
- NetMageSCW 4 months ago
  
  Please point me to where GoDaddy or any other hosting site mentions public suffix, or where Apple or Google or Mozilla have a listing hosting best practices that include avoiding false positives by Safe Browsing…
  
  4 replies →
- account42 4 months ago
  
  It's not a "service" at all. It's Google maliciously inserting themselves into the browsing experience of users, including those that consciously choose a non-Google browser, in order to build a global web censorship system.
- liquid_thyme 4 months ago
  
  >You might not think it is, but internet is filled utterly dangerous, scammy, phisy, malwary websites
  Google is happy to take their money and show scammy ads. Google ads are the most common vector for fake software support scams. Most people google something like "microsoft support" and end up there. Has Google ever banned their own ad domains?
  Google is the last entity I would trust to be neutral here.
- realusername 4 months ago
  
  The argument would work better if Google wasn't the #1 distributor of scams and malware in the world with adsense. (Which strangely isn't flagged by safe browsing, maybe a coincidence)
- udev4096 4 months ago
  
  [flagged]
  
  3 replies →
- delis-thumbs-7e 4 months ago
  
  [flagged]
  
  2 replies →
- 63stack 4 months ago
  
  [flagged]
  
  6 replies →

o11c 4 months ago

Is that actually relevant when only images are user content?

Normally I see the PSL in context of e.g. cookies or user-supplied forms.

dspillett 4 months ago
> Is that actually relevant when only images are user content?
Yes. For instance in circumstances exactly as described in the thread you are commenting in now and the article it refers to.
Services like google's bad site warning system may use it to indicate that it shouldn't consider a whole domain harmful if it considers a small number of its subdomains to be so, where otherwise they would. It is no guarantee, of course.
- thayne 4 months ago
  
  Well, using the public suffix list _also_ isolates cookies and treats the subdomains as different sites, which may or may not be desirable.
  For example, if users are supposed to log in on the base account in order to access content on the subdomains, then using the public suffix list would be problematic.
  
  1 reply →

jkaplowitz 4 months ago

In another comment in this thread, it was confirmed that these PR host names are only generated from branches internal to Immich or labels applied by maintainers, and that this does not automatically happen for arbitrary PRs submitted by external parties. So this isn’t the use case for the public suffix list - it is in no way public or externally user-generated.

What would you recommend for this actual use case? Even splitting it off to a separate domain name as they’re planning merely reduces the blast radius of Google’s false positive, but does not eliminate it.

snowwrestler 4 months ago
If these are dev subdomains that are actually for internal use only, then a very reliable fix is to put basic auth on them, and give internal staff the user/password. It does not have to be strong, in fact it can be super simple. But it will reliably keep out crawlers, including Google.
- jkaplowitz 4 months ago
  
  They didn't say that these are actually for internal use only. They said that they are generated either from maintainers applying labels (as a manual human decision) or from internal PR branches, but they could easily be publicly facing code reviews of internally developed versions, or manually internally approved deployments of externally developed but internally reviewed code.
  None of these are the kind of automatic user-generated content that the warning is attempting to detect, I think. And requiring basic auth for everything is quite awkward, especially if the deployment includes API server functionality with bearer token auth combined with unauthenticated endpoints for things like built-in documentation.

fc417fc802 4 months ago

How does the PSL make any sense? What stops an attacker from offering free static hosting and then making use of their own service?

I appreciate the issue it tries to solve but it doesn't seem like a sane solution to me.

arccy 4 months ago
PSL isn't a list of dangerous sites per-se.
Browsers already do various levels of isolation based on domain / subdomains (e.g. cookies). PSL tells them to treat each subdomain as if it were a top level domain because they are operated (leased out to) different individuals / entities. WRT to blocking, it just means that if one subdomain is marked bad, it's less likely to contaminate the rest of the domain since they know it's operated by different people.
- fc417fc802 4 months ago
  
  Marking for cookie isolation makes sense, but could be done more effectively via standardized metadata sent by the first party themselves rather than a centralized list maintained by a third party.
  Informing decisions about blocking doesn't make much sense (IMO) because it's little more than a speed bump for an attacker. Certainly every little bit can potentially help but it also introduces a new central authority, presents an additional hurdle for legitimate operators, introduces a number of new failure modes, and in this case seems relatively trivial for a determined attacker to overcome.

fukka42 4 months ago

This is not about user content, but about their own preview environments! Google decided their preview environments were impersonating... Something? And decided to block the entire domain.

ggm 4 months ago

I think this only is true if you host independent entities. If you simply construct deep names about yourself with demonstrable chain of authority back, I don't think the PSL wants to know. Otherwise there is no hierarchy the dots are just convenience strings and it's a flat namespace the size of the PSLs length.

andrewstuart2 4 months ago

Aw. I saw Jothan Frakes and briefly thought my favorite Starfleet first officer's actor had gotten into writing software later in life.

r_lee 4 months ago

Does Google use this for Safe Browsing though?

akerl_ 4 months ago

Looks like it? https://developers.google.com/safe-browsing/reference/URLs.a...

ZeWaka 4 months ago

Oh - of course this is where I find the answer why there's a giant domain list bloating my web bundles (tough-cookie/tldts).

BartjeD 4 months ago

There is no law appointing that organization as a world wide authority on tainted/non tainted sites.

The fact it's used by one or more browsers in that way is a lawsuit waiting to happen.

Because they, the browsers, are pointing a finger to someone else and accusing them of criminal behavior. That is what a normal user understands this warning as.

Turns out they are wrong. And in being wrong they may well have harmed the party they pointed at, in reputation and / or sales.

It's remarkable how short sighted this is, given that the web is so international. Its not a defense to say some third party has a list, and you're not on it so you're dangerous

Incredible

snowwrestler 4 months ago

I love all the theoretical objections to something that has been in use for nearly 20 years.
jtwaleson 4 months ago
As far as I know there is currently no international alternative authority for this. So definitely not ideal, but better than not having the warnings.
- BartjeD 4 months ago
  
  Yes but that's not a legal argument.
  You're honor, we hurt the plaintiff because it's better than nothing!
  
  3 replies →
- account42 4 months ago
  
  The alternative is to not do this.