So You Want to Define a Well-Known URI

11 hours ago (mnot.net)

I wish people would follow this, instead of coming up with new standards in the root namespace. "llms.txt" [1] comes to mind, for example.

Let's stop polluting the root of a domain!

[1] https://llmstxt.org/

  • LLMs.txt is also nonsense since it isn't adopted by any of the major AI players.

    • Google has recently added `llms.txt` to Chrome's Lighthouse check for agentic browsing (https://searchengineland.com/google-llms-txt-chrome-lighthou...), so adoption may be coming. Admittedly, I put more faith in

        <link rel="alternate" type="text/markdown" href="https://example.com/foo.md" title="Markdown version of the &lt;Foo&gt; page">
      

      that I copied from Gwern.net. This convention is discoverable (just read the HTML) and naturally adapts to any website size and structure.

      I have created an `llms.txt` for my website anyhow. I use a fixed LLM prompt to generate it from the internal links in `index.md`.

      7 replies →

    • To be fair, "not adopted by any major AI player" is probably the most web-standard-compliant phase of a new web standard.

No, in fact I don't. But this post wouldn't be of any help anyway. It feels like it's about nothing, there is no substance, just stating some obvious facts. Without examples that lead to some real recommendations, this whole expertise claimed by the author is of no use.

Does a change-password registry actually get used, even by bots? I don't see bots checking for a .well-known/change-password url on my sites. It seems a good place to put public configs, just to have a place for them, but not as a means of discovery.

  • Some password managers, such as Chrome's, offer a "change password" button in the UI that informs the user if their password has been compromised. This is based on .well-known/change-password.

Why are they so specific?

Why password-reset instead of a more generic link tree?

Why discord domain verification instead of domain-verifications with a dynamic list on entries?

Seems like a waste of time. I would just define my own spec outside of well known for my use case.

  • Your own spec wouldn't be used by anyone else.

    The password-reset well-known endpoint is used by password managers to show a "Change password..." button in their interface, which magically links to the password change page described in that well-known file.

    • If the website implements it. What about email preferences? Removing account links? There are many use-cases you might want to redirect a user to, but having to make their own well known for it seems dumb instead of using a more generic one. I guess the more flexible it is, the harder adoption becomes as the usage within a spec might diverge, or it grows outside of the spec and becomes unofficial. So maybe password-reset is correct level of specification.

      Anyway discord domain verification can tell in their onboarding docs to put it anywhere. It being well known does nothing. If there was a root level domain verification, then you might as well put it under that. But otherwise why go through a process?

      3 replies →

  • > Why discord domain verification instead of domain-verifications with a dynamic list on entries?

    The TXT record itself is already a dynamic list of entries. It's far simpler and easier to iterate through the list and compare the start of each value with your search string until you find "discord domain verification" directly than it would be to do anything else.

    Example:

        ;; ANSWER SECTION:
        ycombinator.com.        300     IN      TXT     "openai-domain-verification=dv-QbhxxK0G0JK0dnyZ4YTsNAfw"
        ycombinator.com.        300     IN      TXT     "v=spf1 include:_spf.google.com include:mailgun.org a:rsweb1-36.investorflow.com include:_spf.createsend.com include:servers.mcsv.net -all"
        ycombinator.com.        300     IN      TXT     "MS=ms37374900"
        ycombinator.com.        300     IN      TXT     "anthropic-domain-verification-0qe2ww=yK576oHdDgyTcXgkPfj1KXgGt"
        ycombinator.com.        300     IN      TXT     "ZOOM_verify_2ndw8KZxSRa8PT8NmdyXvw"
        ycombinator.com.        300     IN      TXT     "google-site-verification=KsI69Y_jEVkp4eXqSQ9R9gwxjIpZznvuvrus6UolB9Y"
        ycombinator.com.        300     IN      TXT     "ca3-4861b957e83847c188e45d04ec314ee3"
        ycombinator.com.        300     IN      TXT     "apple-domain-verification=WG0sP5Alm7N6h1Te"
        ycombinator.com.        300     IN      TXT     "dropbox-domain-verification=asc63coma4mv"
        ycombinator.com.        300     IN      TXT     "google-site-verification=GJKdQskycEclAGPua3yXB9m_nVhxbrsVps_y-t9SXV0"
        ycombinator.com.        300     IN      TXT     "Wayback verify for support request 741082"
        ycombinator.com.        300     IN      TXT     "google-site-verification=rivq8jKu6AADGtbbEzJhmOpcqq08B7QxIzXxYV8DtyU"
        ycombinator.com.        300     IN      TXT     "rippling-domain-verification=a660f7a4ab77a3de"

    • Having all those TXT records at the domain apex like that makes the TXT query reply huge, which affects, for instance, every mail recipient who merely wants to check the SPF record. This is a bad pattern to follow.

      5 replies →

    • Literally the inner platform effect. We have multiple kinds of DNS record. Let's use them instead of creating a key value store inside a key value store.

    • Domain verifications leak information that they shouldn't - it should be "random key.domain.com in TXT randomkey"

    • "Domain-verifications" is an invitation for everyone else that might need it to use the same standard and convention. "Discord-domain-verification" is not, it's what feels like polluting the global namespace with the company name that might cease to exist in a few years.

      At the very least, it should be "domain-verification-discord", "-google" and so on. Maybe even "-com.discord", "-com.google"? And the first part clearly standardized and registered, instead of one entity using "domain" and another one "site".

      4 replies →

The consideration about having more than one of them on a domain seems like something that's often overlooked.

.well-known started tidy and quietly became the junk drawer of the web root. security.txt, ACME, app-site-association, and counting.

How well-known are those URIs though? :-\

  • I spent 10 minutes searching for one in the article, in the RFC, in the wikipedia page, on google, to search for a .well-known example. Couldn't find one.

    I did read one before while working with github oidc, and I did find it very useful.

    What is it with technical documentations that go deep describing what it is in plenty words but refusing to give a single example? This far from the first case I've ran into either.

    • > I spent 10 minutes searching for one in the article, in the RFC, in the wikipedia page, on google, to search for a .well-known example. Couldn't find one.

      I don't know how that can be, since you claim to have found the RFC; the RFC straight-forwardly states,

      > 5. IANA Considerations

      > This specification updates the registration procedures for the "Well-Known URI" registry, first defined in [RFC5785]; see Section 3.1.

      & then of course directs IANA to establish a registry. We'd expect this section, given the very nature of the RFC is that it establishes a collection of things, so that there is an IANA considerations section should be wholly unsurprising…

      If you see the linked section…

      > The "Well-Known URIs" registry is located at <https://www.iana.org/assignments/well-known-uris/>.

      And there's a link to a listing of every standardized .well-known URI there is.

      > What is it with technical documentations that go deep describing what it is in plenty words but refusing to give a single example?

      The RFC provides an example in the form of "example", but also in the form of "robots.txt" (as a "it could have used this, had this existed", but what else could it have done?).

    • I've been setting up some federated servers (Matrix, activitypub) and I ran into .well_known/ paths in many of them. Webfinger resolver for activitypub and a more custom matrix server-to-server federation endpoint.

  • Slightly less well-known than XDG directories among the developers of Linux-targeted software, it would seem.

    Seriously, what an oxymoronic name. "/index.html" is a well known URL, literally: most of web-developers are aware of it. But inventing a bunch of URLs with predefined semantics and then slapping the "well-known" label on it... well, it won't magically make them actually well-known.

One disappointment you can't help but feel, having worked in technology a while, is about how people solve the same problems over and over in redundant and subtly incompatible ways.

How do you associate metadata with a public name? A SRV record! No, a TXT record! No, a meta tag! No, data attributes! No, an X.509 attribute! No, a random file at top level! No, a well known file under some schema! No, ...

It goes on forever. We're left with a mishmash of mechanisms and lowest common denominator support for them all.

It would be nice if we picked an extension mechanism and maximally enhanced it rather than having everyone invent his own

I wish we had one for navigation layout of a site so browser chrome could render that in a consistent way. It would also be a boon for a11y.

I'm not sure I like `https://domain.com/.well-known/robots.txt` any better frankly

  • Whoever decided it would be a good idea for ".well-known" to be a "hidden" directory is a complete fool. All it does is provide the opportunity for confusion, misconfiguration, skipped backups, missed git check-ins, forgotten updates and more. Literally the only people a folder like that is hidden from is the whoever is managing the web server.

    Sure, if everyone knows what they're doing, it's not a problem. But we all know how long that assumption lasts.

    • I think the blog author is the one who wrote the original RFC. To be fair to him, there once was a time web servers were more commonly thought of as truly being remote directories of files you can view or link to, not just domains the browser hides the rest of, and dotfiles would commonly act like dotfiles in local file listings. Nowadays, the assumption is if you go to the base URL it should only ever serve the default page and if you try to go to a directory it should throw an error. Well, unless you're one of those ancient sites like https://ftp.mozilla.org/

      I'm not saying it's good or bad how things turned it, but the choice of a dotfile for this sure did not pan out well as the web went the exact opposite direction it would have been relevant in.

    • The main point of consideration here probably was how to avoid conflicts with URLs of existing sites, not exactly people who aren't able to serve an endpoint with a dot within its path...

    • TBF, those people are already hit with problems on their apache configuration and fixed their tooling long before the lack of .well-known gives them any problem.