← Back to context

Comment by Panzer04

3 years ago

It’s a little unsettling to think about how much information and knowledge is being locked up in walled-garden servers on discord, basically unsearchable (discord has a search feature, but it’s pretty awful). There’s so many communities that end up moving to it because it serves their most engaged members so well, but it’s terrible for everyone else.

For example, “Voron” 3D printers are an awesome open-source design, but more and more I am directed to their discord to ask questions - many of which were, in all likelihood, asked dozens of times before. It’s great for their engaged members, who are all super helpful - but if it’s a reddit thread I can get my answer almost immediately, rather than asking, waiting and consuming someone else’s time for trivialities.

Sites like reddit at least can be readily searched from a conventional search engine, and can be crawled and stored externally in a pinch. Discord has its place, especially for game communities or other such personal things, but I’m not sure it’s ideal compared to a conventional forum as time passes and more information is built up and either lost or hidden away.

What began as effectively an IRC-like alternative + file hosting and voice support is now being used as a replacement for forums and I think that's where the issue is.

IRC isn't publicly searchable either unless someone was logging it and uploading them to some web server. IRC chats similarly often contain very useful info and answers.

Discord unfortunately doesn't have any native chat export feature so the best that can be done are third-party exporters, copy-pasting or screenshots which aren't ideal and don't end up being indexed as desired even if communities wanted them to be.

  • > What began as effectively an IRC-like alternative + file hosting and voice support is now being used as a replacement for forums and I think that's where the issue is.

    IRC was always an alternative/replacement for forums for many people (there was always the IRC vs forums debate for projects, gaming teams, etc.). Discord is just better IRC (Slack would have been this without the self sabotage via limited free accounts and sleeping on voice chat for years). Now it's more... Discord vs Reddit. But think of all the lost lore on those IRC servers or random private forums that are now gone.

    • Don’t forget the multiple severs multiple logins hassle, I wasn’t particularly bothered until I got to about the dozen mark at which point I started to resent any new slack I had to interact with, because even with a password manager it was a shitty interaction, because your likely matching on the slack TLD not the per slack unique FQDN, and given that owners can rename the slack and change that it’s not a bad thing to have the slack TLD be the login domain… anyway some people found this a problem immediately (likely not using password managers) and you can tell that it’s been an issue given that Slack really pushed the Magic Link logins pretty quickly, they obviously felt the pain.

    • > there was always the IRC vs forums debate for projects

      Was there? IIRC for any non-tiny community the conclusion pretty much always was to use both, for different purposes.

  • > Discord unfortunately doesn't have any native chat export feature so the best that can be done are third-party exporters, copy-pasting or screenshots which aren't ideal and don't end up being indexed as desired even if communities wanted them to be.

    Server owners have the option for message logging bots. If communities wanted logging, they do have that option with server owner buy-in.

    Arguably, difficult-to-archive by default if you aren't a server owner helps foster a safer chat experience.

  • > now being used as a replacement for forums

    In Matrix there is an awful "threaded discussions" feature now, where collapsed forks branch off from the main chat flow. Which you have to separately manage. Keep a chat a chat, and a forum a forum.

    • Why do you think the threaded discussion feature is awful? I find it useful to keep track of multiple running conversations with someone or have a topic discussion within say a #Help room.

    • I always wanted to get this feature in chat, as my conversation (with a single person) often consist of few separate, independant threads.

  • Exactly right.

    We're complaining about a problem that was no different in the heyday of IRC.

    Ultimately, publicly searchable information is a voluntary act.

    • > We're complaining about a problem that was no different in the heyday of IRC.

      IRC didn't pretend that you had information stored there, though...

      Discord hides this fact better by offering semi-permanence and a "search" feature, so it seems like its a real-time forum, when in fact its just slack with better video features...

      Its very different. The IRC of the yester-yore was filled with people sharing links to more-permanent docs, mailing lists. Discord is filled with people under the mistaken impression they have documented anything by having a chat with a search feature, (and this is has happened to me) claiming that the "rules" are documented somewhere in a channel and why haven't I seen them, when the "rules" are in fact buried, no longer accessible, or just hidden in a wall of conflicting information.

      If you have an IRC, you have a dedicated community of people who try because they must, if you have a Discord you have a group of randos at best, or at worst a delusional echo chamber.

      1 reply →

    • IRC can be accessed by any client adhering to IRC standards which are free and open.

      Discord can be accessed by any client adhering to Discord standards, which are closed and proprietary.

      Nobody has an obligation to publish information for public access, nor is free necessarily superior to proprietary or vice versa, but Discord is absolutely less accessible than IRC or HTTP(S) as an objective fact.

      16 replies →

    • >We're complaining about a problem that was no different in the heyday of IRC.

      so the death of lore happened essentially when textual knowledge moved from the printed, physically stored version to the electronic version that has no easy way of going and getting the verifiable source of lore for reference when needed?

      sounds about right.

  • > file hosting

    That part is such a joke. I routinely run into the problem that I can't send someone a screencast because it's over EIGHT MEGABYTES. In 2023. And Discord insists on storing the bit-perfect copies of original images and videos too to add to the insult. I would be perfectly fine with them being compressed and/or stored temporarily but nah. I have to resort to cloud storage services like Yandex disk to go around that asinine limitation.

    Your files are too powerful, my ass.

    • > I routinely run into the problem that can't I send someone a screencast because it's over EIGHT MEGABYTES.

      Same. I can't just hit print screen and paste it into Discord, a single frame of my 3440x1440 resolution is too big for their free file size limit. I have to crop and/or scale it first. Just frustrating and an extra few steps.

      7 replies →

  • Forums have the same problem of locking up knowledge, and they also keep out anyone who doesn’t want to constantly visit a web site.

    Mailing lists and newsgroups are the proper mechanisms for this. And you should be able to use them with a browser forum-style too.

    • While some forums do lock information behind registration-barriers, it's rare, whereas every discord is locked behind a email+cell phone barrier. It's far, far worse for anyone who doesn't want to be part of your community or install an app just to be able to read the docs or download a patch or ask a question.

      6 replies →

    • That depends on the forum. Most forums could be searched and posts found via your search engine of choice. Well, the current crop of forums (discourse) prevents that with being JS based, but I would not count it as a good forum anyway.

      1 reply →

You’re seeing the symptom of something deeper.

Conventional search was a ~20 year solution to navigating the “entirety” of online content when the available content was within the scope of that innovation. That era is coming to an end. There’s just too much content to index literally and too much noise too quantify quality and that problem is getting worse much faster than crawl+search technology can scale.

So new techniques to navigating content are emerging, some of them calling back to pre-search solutions.

LLM chat assistants drop the literal reference requirement by just mushing up all the sources they can and hallucinating something vaguely relevant to incoming questions. They lean into the noise and try to find patterns in it rather than sources.

Meanwhile, “walled garden” private communities like Discord, Slack, Whatsapp/iMessage, and the growing list of login-required social content sites commit to sharing literal source content but address the noise problem by regimenting and moderating how content is incorporated.

There will almost certainly be a next generation “meta-search” that can help you frame and make queries across these walled gardens, but it’s going to take a long while for the infrastructure and business models around that to establish themselves.

In the meantime, this is what we get and what we can expect for a while.

  • This is correct. Google became a monopoly and they stopped caring about surfacing any results that they couldn't immediately monetize. If your content is archived in a forum somewhere, Google won't find it anyway, it'll instead show you results from youtube, ads, and whatever is on top of their cache. Search is effectively dead, so we have to resort to asking other humans directly for answers. Which sucks, but that's the phase of the competition/monopoly cycle that we're in right now.

    • ignoring the problem of robots.txt inaccessibility, is it feasible to have Kagi-style "private google" with a more limited number of high-signal-to-noise sites, especially if you drop the concept of e-commerce and some other low-SNR feeds?

      perhaps one interesting thing is that a decent number of the highest-SNR feeds don't actually need to be crawled at all - wikipedia, reddit, etc are available as dumps and you can ingest their content directly. And the sources in which I am most interested in for my hobbies (technical data around cameras, computer parts, aircraft, etc) tend to be mostly static web-1.0 sites that basically never change. There's some stuff that falls inbetween, like I'm not sure if random other wikis necessarily have takeout dumps, but again, fandom-wiki and a couple other mega-wikis probably contain a majority of the interesting content, or at least a large enough amount of content you could get meaningful results.

      Another interesting one would be if you could get the Internet Archive to give you "slices" of sites in a google takeout-style format. Like they already have scraped a great deal of content, so, if I want site X and the most recent non-404 versions of all pages in a given domain, it would be fantastic if they could just build that as a zip and dump it over in bulk. In fact a lot of the best technical content is no longer available on the live web unfortunately...

      (did fh-reddit ever update again? or is there a way to get pushift to give you a bulk dump of everything? they stopped back in like 2019 and I'm not sure if they ever got back into it, it wasn't on bigquery last time I checked. Kind of a bummer too.)

      I say exclude e-commerce because there's not a lot of informational value in knowing the 27 sites selling a video card (especially as a few megaretailers crush all the competition anyway), but there is lots of informational value in say having a copy of the sites of asus, asrock, gigabyte, MSI, etc for searching (probably don't want full binaries cached though).

      But basically I think there's probably like, sub-100 TB of content that would even be useful to me if stored in some kind of relatively dense representation (reddit post/comment dumps, not pages, same for other forum content, etc, stored on a gzip level5 filesystem or something). That's easily within reach of a small server, not sure if pagerank would work as well without all the "noise" linking into it and telling you where the signal is, but I think that's well within typical r/datahoarder level builds. And you could dynamically augment that from live internet and internet archive as needed - just treat it as an ever-growing cache and index your hoard.

      7 replies →

    • I don't know about youtube, but duckduckgo has showed me a forum among the top results just two days ago ?

Cynically speaking, platforms like Discord are not interested in their content being widely searchable. If a,user can satisfy their need for information by reading an existing answer (even within Discord), that user needs Discord less. The user depends less on there being an active community in Discord members of which could provide answers.

This all likely means fewer paying subscribers.

So, Discord search should be fine for a few months depth, and need not be good further into the past. Exposing historical data to external search engines is even less desirable.

Same of course applies to Slack, HipChat and whatever other commercial chat-like software.

  • I don't buy it: making discovery faster and more accessible may result in fewer hours spent on discord, but Discord as a whole becomes more valuable. If slack created some awesome feature that mined your company's chat logs to train something like the Librarian in Snow Crash that would be a massive selling point even if it means fewer chats between human beings. It's s valuable because it produces an answer without taxing another human.

    • >I don't buy it: making discovery faster and more accessible may result in fewer hours spent on discord, but Discord as a whole becomes more valuable.

      This could be completely hidden to the Discord devs though. They could very well be slavishly improving various internal metrics. If one of those is time spent in the app, then I could easily see search ending up a lower priority. It doesn't even have to be intentional.

    • Look at stackoverflow. They made their information extremely accessible and now search engines are riddled with sites ripping the content directly from stackoverflow, rehosting it, and winning SEO on specific questions which bleeds advertising revenue.

      Discord, and 99.99% of companies under capitalism, do not care if they produce more value unless they get to keep that value.

      7 replies →

> it because it serves their most engaged members so well, but it’s terrible for everyone else

I join a discord channel and honestly most of the time I’m overwhelmed. There’s often a ton of sub channels for every specific thing… that aren’t very active. It’s hard to get a feel for what is going on.

I join some only to find everyone is annoyed by my elementary question, but hell if I can find any answers in discord.

I just never know the lay of the land.

The handful of highly active people do, but that’s it.

  • Yup - the most engaged members are usually running the server, so they optimise it for their use. Multiple channels and categories make it easy to remember context if you switch often, and prevent recent relevant conversations from going out of scope too quickly.

Discord has its place, especially for game communities or other such personal things

I'm not sure Discord is necessarily good for "gaming communities". I mean, Discord is live chat. This is good for some aspects: match making, news, etc. - anything that has a short lifespan. However, a lot of things about games don't. Wikis are perfect for publishing this info. Think of a Street Fighter type game. Characters have their move set, that doesn't change. Imagine having to search through discord for how to do a fireball with Ryu. Then there's strategy - that changes, albeit periodically, after tournaments, etc. By all means, this stuff can be discussed in discord, but the consensus strategies have to be published because it's a terrible experience to search through chat logs and follow along with long finished conversations over who knows how many posts, figure out context, etc. compared to just reading it on a wiki.

  • Agreed.

    I see Discord being used for things where forums (and non realtime conversations) used to rule supreme, like hobbies, tutorials, etc.

    Why? A tutorial is not a real time chat. A showcase of hobby projects isn't a real time conversation either. And the searchability of these tools like Discord is terrible. I really don't understand why this terrible thing became popular.

  • Those communities iirc nowadays operate a dual mediawiki + discord server. Dustloop (arcsys, Guilty Gear & BlazBlue mostly) for example didn't vanish entirely to Discord, the discord is instead just used to organize edits to the wiki if I recall.

    Discord is absolutely terrible at storing semi-structured information like a wikipage and I don't see them fix that without completely overhauling their entire service (although I'm sure they'll try and muck up the death of publicly available knowledge even more).

    • GP mentions "lore", that has not been deliberately condensed to a wiki. A forum works much better for lore : to search and reply years later to some very specific question / answer.

      (I guess wiki comments and especially talk pages can work too, but they are terrible and not the place for community discussion.)

  • I guess I should be clearer - I'm more referring to multiplayer communities rather than actual game-specific resources. Communities, rather than knowledge resources, are where it's appropriate, basically. Of course, these often end up bleeding into eachother over time.

  • You would be surprised by the FGC entities that seemingly gravitate towards this very thing.

What's far more unsettling is knowing that Discord's sysadmins, as well as their acquirer (which for a minute looked like it might have been MSFT) have the complete plaintext logs of every DM conversation. Every private link, every NDA'd product info, every insider crypto tip, all the passwords and credentials, all the sexting, all the nudes.... and all linked to your real world identity via the non-VoIP phone number you have to add to your account to join most channel groups ("servers").

The trove of blackmail and extortion data alone is worth a few dozen millions. The insider crypto trading that Discord makes possible is worth probably $20-50mm USD each month.

  • And the mundane reality is that they won't use it for blackmail or in any such personalized fashion (except for government requests). What they will use it for, is training language models. That's the new way of monetizing "user-generated content", especially one you've managed to lock up so it can't be casually scrapped by anyone else.

  • One of Discord's shareholders is Tencent (not Microsoft!!), however, the co-founders are still on the board and are highly occupied with day to day stuff. Message deletion on Discord is traceless from the outset (once deleted, no one, not even database admins, can retrieve deleted messages; the same thing also applies to the metadata, but not messages, of deleted users), in fact, this is one of the features they are bragging with. Traceless deletion is verified by both engineering related blogposts and interactions with support.

Unfortunately, even the LLVM community has chosen a combination of Discord/Discourse, and deprecated their mailing list/IRC channel. This is a very unhealthy trend, and only the most ardent of communities such as Linux/Git/GCC stick to old-fashioned publicly-archived mailing lists.

[1]: https://public-inbox.org/README

Most of German immigration information and advice is locked into private Facebook groups. This information is not good nor reliable, but it's the only way to tell how things actually play out at the immigration office, for example.

That information would serve a lot more people if it was available to them with a simple search.

  • indeed very disappointing that most games have wikis/subreddits where one can get the gist of it served in a consumer friendly format but public services do not

    • Well, it's people sharing their personal experiences. I'm sure the government has a public website where they try to be helpful, but personal experiences of immigration are very useful, and just not really something the government can collate (often because they work against each other - people trying to 'game' immigration).

      2 replies →

Reddit is still bad, not only because it's a platform, but also because it tends to lock threads after merely a couple of months (which prevents necroposting, which sometimes IS the right thing to do, while creating a new post is the wrong one) - so even if someone comes in later with a solution, they can't even answer the previous posters !

(and some of the new forums, seems like Discourse has it on by default?)

  • Hackernews seems to solve this by simply not notifying us of responses, so an ancient post that somehow gets responded to is just never noticed.

    • Actually, HN has the same issue than Reddit here : it locks threads.

      But now I wonder why duckduckgo never seems to show hn results (unless restricted to it of course) ? I have searched hn before as it contains a lot of "Lore" as GP calls it, some of it quite helpful !

      I guess that hn is a bit to reddit what is IRC to Discord : because of its focus and lack of features, it's ironically better in this context because most people won't even try to use it for something more serious than "post-it's on the fridge door".

      1 reply →

Discord is nice because it lets you have small talk and build strong communities with a more natural cadence. In the same way we don't record every spoken conversation, I think we don't need to stress about discord going missing. I imagine any historically relevant outcomes of conversations on discord will be recorded outside of discord.

All that said, it's still important to have knowledge bases to reference I agree, but I think that should be an effort separate to Discord. I wouldn't want to search an arbitrarily long chat log to find answers to questions, it's not well suited for the task.

  • I would say in something like speedrunning there is tons of information that only exists in discord pinned threads etc., and in the combined heads of community members.

    To those who care about that hobby a lot of very useful information would vanish if discord went away. That's not a great state of affairs and discord is a very bad place to keep that information, even at the best of times it's not easy to find things in there. But I think it's probably the case for a decent number of communities.

    You could definitely argue that it's not historically relevant, since, if speedrunning as a whole disappeared it wouldn't really matter.

I will say what I feel like I say whenever this comes up: Discord could contribute to a solution here, even partially, by making Discord Forums search indexable. It would (theoretically) help with the "has my question been answered before" and it would (theoretically) make (some) archival efforts simpler.

I discovered one years ago that Facebook had a trove of useful technical groups that are completely invisible because they are not indexable. I wonder how much knowledge we have lost because of discord and FB are the new forum

I remember chatting on the RepRap 3D Printer forums in the early 2010s, everything searchable by Google and static text. That mode is long gone now, Discord's superior UX seems to have swallowed up most of the forums.

We've had the same thing happen with a local gaming club - we've pretty much wholesale moved from Facebook to Discord. Which is great for current members, but makes recruiting (which is sort of important in a college town) next to impossible.

> It’s a little unsettling to think about how much information and knowledge is being locked up in walled-garden servers

Not just on Discord, in general, everywhere: Facebook, Instagram, TikTok, Reddit and even here HackerNews.

It's not just about being searchable or long-lived, but that they can pull the rug anytime (or make stealth edits like the Reddit fiasco with their admin Spez)

How to solve this? Should the burden of searchability and archival be a requirement upon companies that provide social media services? Third-party bots already manually crawl all services and provide external search/archive interfaces (like the "Reddit Undelete" services) How to make that better?

> discord has a search feature, but it’s pretty awful

Their search makes me want to pull my hair out.

Why can't I just search the history with grep? That's a feature I would pay for (if anyone at Discord is reading this and wants my $10/month)

  • I'd be happy even if they just didn't hijack Command/Ctrl-F. You can't even search for text on the screen you're viewing without being dumped into their shitful pseudosearch.

  • Because it would be obscenely expensive (in terms of computational resources) to search history via grep. Hence why we use an inverted index.

    • This is megabytes (at most) of text we're talking about, not gigabytes. And ripgrep is absurdly fast. Grepping a 5MB text file should be pretty much instantaneous.

      3 replies →

And you can't even filter out spammy channels like bot command channels. On mobile it's particularly awful where if you look for a message's context, it's very likely you'll lose your "place" in the search panel and have to scroll from the top all over.

  • You can filter a discord search by what channel the message is in.

    • But not if you don't. Large servers tend to have information spread across many different channels and if you want to find out about something that may be in different channels, you have to search individually in each of them and that gets unwieldy fast.

IRC was the same way for 15 or so years, all mostly lost.

  • IRC used to be indexed by Google and such though. Also because irc clients didn't have rich media support a lot of knowledge would make it outside of IRC like code snippets or microblogs that would eventually get indexed.

    • Google didn't "Index IRC". Source: have been on IRC before Google was a thing.

      What Google indexed was people's logger bots putting channel logs on the world wide web.

      1 reply →

I'd say it's a symptom of how toxic the public internet (especially Twitter) has become. When there is an angry mob constantly scouring every community for transgressions (real or imagined), being unindexed and unsearchable is a feature not a bug.

That is if you get support at all. Many times I have not. At least on an issue tracker I can find other people who might have solved a similar problem on their own.

> more and more I am directed to their discord to ask questions - many of which were, in all likelihood, asked dozens of times before

It becomes painfully obvious and a little funny when the question triggers a bot to reply with a link to a pinned comment answering FAQ#36, and you see it happen a dozen times a day.

I wonder if setting up a discourse instance is too much of a friction that businesses are instead choosing a real-time chat inspite of it being awful for knowledge sharing as a whole.

May be there's a need-gap for a low friction forum.

Even more unsettling, when you realise there must be a lot of children talking to a lot of adults, about who knows what, on discords invisible to parents and police.

Do people want to write comments and replies that are publicly searchable?

Discord's search works great for me

  • The search is per-server so before you even start you need to know which server has the thing you're looking for, which isn't always obvious, and you can't search a server without actively joining it, announcing your presence and using up your finite server slots. There's no equivalent to Googling something and passively pulling answers from wherever.

  • Unless i'm missing something (and noting the drawbacks mentioned elsewhere re. no google search), its native search doesn't have any kind of fuzziness. You can't think of some set of terms and have it bring up a particular thing, you have to know an exact word used in it.

    This means you need to know enough already just to get it to come up, and more to prune out the other 500 results if your term is generic enough. Basically, the search only really works if you want to find a specific conversation you remember (and it better be recent, given how easily you forget the specifics of things as time passes)

    • > Unless i'm missing something, its native search doesn't have any kind of fuzziness.

      For the past couple years it's returned variations of words, eg: `installer` will return `installers`, `installed`, `installs`, `installing`. However it doesn't return synonyms. It's essentially just searching like one would with regular chat logs, except limited to searching only one server at a time.

      I actually wish I could disable such matching since even with double quotes I can't get only an exact word matched (which is necessary when trying to filter a large number of results).

Walled gardens are unfortunately the future of the internet.

The public web is full of bots and adversarial content. Worse, anything you contribute in good faith can be used against you in the future by businesses and governments. Even in the rare case where those institutions are trustworthy, there is no guarantee of them being that way. So, the public web’s only power users are those who seek to influence others, who are therefore adversarial toward any higher minded purpose.

Balkanisation and fragmentation, unfortunately, seem to be at least our near future.

  • But how does this overlap with federated platforms? Then you can still balkanize into tiny groups but federate among compatible groups to rebuild larger networks bottom-up (I like to draw a comparison to how multicellular organisms are composed of many discrete cells). And if you don't federate with hosts that serve businesses, you can fly way under the radar (Pleroma even has built-in onion routing support IIRC).

> discord has a search feature, but it’s pretty awful

I don't think it's bad at all, why do you say it's awful? I feel like you think it's bad because you have a negative bias towards discord

  • It only really works for exact matches. If I have a general idea of what I'm looking for, I can usually hunt it down with google. Discord will give me too many results or no results, which sucks.

    I like discord - It makes up a very significant (honestly, majority) portion of my social life. But it's not good for any kind of real information storage (FAQ, guides, expert answers and so on) compared to a forum.