This keeps me up at night too. I’d like to stake the position that LLMs are antagonistic to the (beleaguered) idea of an internet.
LLMs increase the burden of effort on users to successfully share information with other humans.
LLMs are already close to indistinguishable from humans in chat; Bots are already better at persuading humans[1]. Suggesting that users who feel ineffective at conveying their ideas online, are better served by having a bot do it for them.
All of this, is effectively putting a fitness function on online interactions, increasing the cognitive effort required for humans to interact or be heard. I dont see this playing out in a healthy manner. The only steady state I can envision is where we assume that we ONLY talk to bots online.
Free speech and the market place of ideas, sees us bouncing ideas off of each other. Our way of refining our thoughts and forcing ourselves to test our ideas. This is the conversation that is meant to be the bedrock of democratic societies.
It does not envisage an environment where the exchange of ideas is into a bot.
Yes yes, this is a sky is falling view - not everyone is going to fall off the deep end, and not everyone is going to use a bot.
In a funny way, LLMs will outcompete average forum critters and trolls for their ecological niches.
> increasing the cognitive effort required for humans to interact or be heard. I dont see this playing out in a healthy manner
We are at the stage where it’s still mostly online but the first ways this will leak into the real world in big ways are easy to guess. Job applications, college applications, loan applications, litigation. The small percentage of people who are savvy and naturally inclined towards being spammy and can afford any relevant fees will soon be responsible for over 80 percent of all traffic, not only drowning out others but also overwhelming services completely.
Fees will increase, then the institutions involved will use more AI to combat AI submissions, etc. Schools/banks/employers will also attempt to respond by networking, so that no one looks at applicants directly any more, they just reject if some other rejected. Other publishing from calls for scientific papers to poetry submissions kind of progresses the same way under the same pressures, but the problem of “flooded with junk” isn’t such a new problem there and the stakes are also a bit lower.
In Peter Watts' Maelstrom (2002) it's ultimately self replicating code that pushes the internet from a brutal and rough and competitive infoscape into something worse & even more rawly aggressive. But the book and it's tattered wasteland of the internet still has such tone setting power for me, set such an image up of an internet after humans: where the competing forces of exploitation have degraded and degraded and degraded the situation, pushing humans out.
The only way to solve it for decentralized messaging systems is a decentralized system for verification of identities based on chain of trust and use of digital signatures by default. It must be a legal framework supported by technical means. For example, id providers may be given a responsibility to confirm certain assumptions about their clients (is a real human, is adult etc) while keeping their identity confidential. The government and the corporations will know only what this person allows the id provider to disclose (unless there’s a legal basis for more, like a decision of the court to accept a lawsuit or a court order to identify suspect or witness). Id provider can issue an ID card that can be used as authentication factor. As long as a real person can be confirmed behind the nickname or email address, the cost of abuse will be permanent ban on a platform or on a network. Not many people will risk it. Natural candidates for id providers can be notaries.
Yes, I think we'll see the rise of id-verified online communities. As long as all the other members of the community are also id-verified, the risk of abuse (bullying, doxing, etc) is minimized. This wouldn't stop someone from posting AI-generated content, but it would tend to suppress misinformation and spam, which arguably is the real issue. Would people complain about AI-generated content that is genuinely informative or thought-provoking?
Having links in comments has always been problematic.
For myself, I usually link to my own stuff; not because I am interested in promoting it, but as relevant backup/enhancement of what I am writing about. I think that a link to an article that goes into depth, or to a GitHub repo, is better than a rough (and lengthy) summary in a comment. It also gives others the opportunity to verify what I say. I like to stand behind my words.
I suspect that more than a few HN members have written karmabots, and also attackbots.
I recall blogs from over 20 years ago, with blatant comment spam, where the blog author would respond to the comment spam individually as if it was real readers. Most didn't fall for that, but a few clearly didn't understand it.
I'm not sure LLMs deviate from a long term trend of increasing volume of information production. It certainly does break the little bubble we had from the early 1990s until 2022/3 where you could figure out you were talking to a real human based on the sophistication of the conversation. That was nice, as was usenet before spammers.
There is a bigger question of identify here. I believe the mistake is to go the path of: photo ID, voice verification, video verification (all trivially by-passable now.) Take another step further with Altman's eyeball thing, another mistake since a human can always be commandeered by a third party. In the long term do we really care that the person we are talking to is real or an AI model? Most of the conversations generated in the future will be AI. They may not care.
I think what actually matters more is some sort of larger history of identify and ownership, matching to what one wishes to (I see no problem with multiple IDs, nicks, avatars.) What does this identify represent? In a way, proof of work.
Now, when someone makes a comment somewhere, if it is just a throw away spam account there is no value. Sure, the spammers can and will do all of the extra stuff to build a fake identity just to promote some bullshit produce, but that already happens with real humans.
I think that, ultimately, systems that humans use to interact on the internet will have to ditch anonymity. If people can't cheaply and reliably distinguish human output from LLM output, and people care about only talking to humans, we will need to establish authenticity via other mechanisms. In practice that means PKI or web of trust (or variants/combinations), plus reputation systems.
Nobody wants this, because it's a pain, it hurts privacy (or easily can hurt it) and has other social negatives (cliques forming, people being fake to build their reputation, that episode of Black Mirror, etc.). Anonymity is useful like cash is useful. But if someone invents a machine that can print banknotes that fool 80% of people, eventually cash will go out of circulation.
I think the big question is: How much do most people actually care about distinguishing real and fake comments? It hurts moderators a lot, but most people (myself included) don't see this pain directly and are highly motivated by convenience.
We will ditch anonymity, but for pseudonymity, not eponymity. Meaning someone, somewhere will know who is who and can attest that 1000 usernames are humans, but people will be able to identify with just a username to everyone else, except that one person.
>In practice that means PKI or web of trust (or variants/combinations), plus reputation systems.
Yep, that is the way.
Also LLMs will help us create new languages or dialects from existing languages, with the purpose of distinguishing the inner group of people from the outer group of people, and the outer group of LLMs as well. We are in a language arms race for that particular purpose for thousands of years. Now LLMs are one more reason for the arms race to continue.
If we focus for example in making new languages or dialects which sound better to the ear, LLMs have no ears, it is always humans who will be one step ahead of the machine, providing that the language evolves non stop. If it doesn't evolve all the time, LLMs will have time to catch up. Ears are some of the more advanced machinery on our bodies.
BTW I am making right now a program which takes a book written in Ancient Greek and creates an audiobook, or videobook automatically using Google's text to speech. The same program on Google Translate website.
I think people will be addicted in the future with how new languages sound or can be sung.
No one is trying to take away your right to host or participate in anonymous discussions.
> Those systems also use astroturfing. It was not invented with LLMs.
No one is claiming that LLMs invented astroturfing, only that they have made it considerably more economical.
> You're just doing the bidding of corporations who want to sell ID online systems for a more authoritarian world.
Sure, man. Funny that I mentioned "web of trust" as a potential solution, a fully decentralised system designed by people unhappy with the centralised nature of PKI. I guess I must be working in deep cover for my corporate overlords, cunningly trying to throw you off the scent like that. But you got me!
If you want to continue drinking from a stream that's been becoming increasingly polluted since November 2022, you're welcome to do so. Many other people don't consider this an appealing tradeoff and social systems used by those people are likely to adjust accordingly.
You could have authenticated proofs of human-ness without providing your full identity. There are similar systems today which can prove your age without providing your full identity.
You ditch anonymity, and you have this cascading chilling effect through the interwebs because you cannot moderate communities against the political head winds of your nations.
Worse, it won’t work. We are already able to create fake human accounts, and it’s not even a contest.
And with LLMs, I can do some truly nefarious shit. I could create articles about some discovery of an unknown tribe in the Amazon, populate some unmanned national Wikipedia version with news articles, and substantiate the credentials of a fake anthropologist, and use that identity to have a bot interact with people.
Heck I am bad at this, so someone is already doing something worse than what I can imagine.
Essentially, we can now cheaply create enough high quality supporting evidence for proof of existence. We can spoof even proof of life photos to the point that account take overs resolution tickets can’t be sure if the selfies are faked. <Holy shit, I just realized this. Will people have to physically go to Meta offices now to recover their accounts???>
Returning to moderation, communities online, and anonymity:
The reason moderation and misinformation has been the target of American Republican Senators is because the janitorial task of reducing the spread of conspiracy theories touched the conduits carrying political powers.
That threat to their narrative production and distribution capability has unleashed a global campaign to target moderation efforts and regulation.
Dumping anonymity requires us to basically jettison ye olde internet.
I kind of wonder if I care if comments are real people and actually probably don’t as long as they’re thought provoking. I actually thought it would be an interesting experiment to make my own walled garden LLM link aggregator, sans all the rage bait.
I mean, I care if meetup.com has real people, and I care if my kids’ schools Facebook group has real people, and other forums where there is an expectation of online/offline coordination, but hacker news? Probably not.
I feel like part of why comments here are thought provoking is because they're grounded in something? It's not quite coordination, but if someone talks about using software at a startup or small company I do assume they're genuine about that, which tells you more about something being practical in the real world.
And use cases like bringing up an issue on HN to get companies to reach out to you and fix it would be much harder with llms taking up the bandwidth probably.
I could understand that position, except that I don't think most LLM generated text are for the purpose of producing thought provoking conversation.
My expectation would be that anyone going through the effort to put a LLM generated comment bot online is doing it for some ulterior motive, typically profit or propaganda.
Given this, I would equate not caring about the provenance of the comment, to not caring if you're being intentionally misinformed for some deceptive purpose.
Agree. Another complicating factor for detection is that I don't personally mind seeing a sliver of self-promotion in a comment/post if I feel it's "earned" by the post being on-topic and insightful overall. If such a comment was posted by an LLM, I think I would actually be fine with that.
That seems to only say that synthetic data is a larger part of models today than in the past. The newer OpenAI models knowingly hallucinate more. Claude 4 seems great but not a multiplier better. Makes me think the effect of synthetic data is at best a net 0. Still has yet to really be seen though.
Debunked is a bit too strong. He qoutes from phi-4 repor that it is easier for the LLM to digest synthetic data. A bit like feeding broiler chickens other dead chickens.
Maybe one day we will have organic LLMs guaranteed to be fed only human generated content.
Even supposing the purported "model collapse" does occur, it doesn't destroy the LLMs we already have -- which are clearly already capable of fooling humans. I don't see the clown party being over, just reaching a stable equilibrium.
Exactly. It logically can't occur, even by the own flawed assumptions of the people that say this. Just freeze all training data at 2024 or keep existing models, the worse case scenario is the models will plateau.
Twitter, LinkedIn, and others are following the credit card and id (KYC) way but the issue remains when people start automating interactions, not spam per se but it creates a waste of time since users cannot cope with the triggering of zillions of interactions that cannot be followed by human-time.
> Because if there's one place where Google didn't solve spam, it's on YT's comments
I do believe that this problem is very self-inflicted (and perhaps even desired) by YouTube:
- The way the comments on YouTube are structured and ordered makes it very hard to make deep discussions on YouTube
- I think there is also a limit on the comment length on YouTube, which again makes it hard to write longer, sophisticated arguments.
- Videos for which a lot of comments a generated tend to become promoted by YouTube's algorithm. Thus YouTubers encourage viewers to write lots of comments (thus also a lot of low-quality comments), i.e. YouTube incentivizes that videos are "spammed" with comments.
The correct solution would be to incentivize few, but high-quality comments (i.e. de-incentivize comments that contribute nothing valuable (i.e. worth your time to read)). This makes it much easier to detect and remove the (real) spam among them.
This doesn’t work in perpetuity. One of the reason why spam is so persistent is that when you ban a spammer, they can just create a new identity and go again. If payment is required then not only do they have to repeatedly pay every time they get banned, they need a new payment card too because you aren’t limited to banning their account – you can ban the payment mechanism they used.
What if we charged a small toll for comments. We create a web standard where you can precharge an amount to your browser account, then you get charged $0.02 for making a comment. The price could be progressively raised until the spammers stop. The profit could pay for website hosting. This would be affordable for users but prohibitively expensive for spammers.
The problem originates from LLM service so the toll needs to be on LLM usage in a way that doesn't harm legitimate users but makes it unprofitable to abuse in bulk.
I'm not that worried because most content and content moderation became heavily homogenized through censorship and political manipulation to the point that a sizeable number of conversations and posts provided very little space for "breakthrough value" or "original value". Of course, if you're part of the Overton window you're just now concerned but if you weren't, you're actually excited to see the disruption.
I do recognize the capabilities to hurt from bots in many spaces and cause cost are a real thing to contend with but the paradigm shift is fascinating. Suddenly people need to question authority (LLM output). Awesome. You should've been doing that all along.
The internet sometimes feels like living in a holographic world, as in "The Invention of Morel" [1].
A recent anecdote: an acquaintance of mine automated parts of his LinkedIn activity. After I liked one of his posts, I received an automatic message asking how I was doing. I recognized that the message didn't match a personal tone, but I replied anyway to catch up. He never responded, highlighting how people are automating the engagement process but can't really keep up with the manual follow-through.
I've got a spouse who works in marketing/communications who has spent the weekend working after hours moderating comments on posts about Pride events, and I was musing with her about this -- the days of comments being a thing at all, are numbered. As a means of getting engagement, it gets increasingly the wrong kind not just because of generative AI automation, but because being an asshole is now considered virtuous by many of our highest leaders and the masses are following.
What's the point in even having comments sections? The CBC here in Canada shut theirs off years ago and frankly the world is better for it. News articles are a swamp of garbage comments, generally.
The future of social engagement online is to go back to smaller, registration-required, moderated forums.
Since detecting LLMs is a silly end goal, the future of moderation probably needs LLMs too, but to evaluate text and see if it fits into blatant commercial speech. It will ruin places where some kinds of commercial speech is wanted (say, asking for a recommendation on reddit). Still, the mindless recommendation of crypto rugpulls and other similar scams will go away.
I am more concerned about voice alignment efforts, like someone creating over time 10k real-ish accounts attempt to contribute, but are doing so to just abuse upvote features to change perception. Ultimately figuring out what is a real measure of popularity , and what is just a campaign to, say, send people to your play is going to get even harder than it is now
> It will ruin places where some kinds of commercial speech is wanted (say, asking for a recommendation on reddit).
There is also a dependence on the culture. For example, what in the USA would be considered a "recommendation" (such as on Reddit) would often be considered "insanely pushy advertising" in Germany.
With this in mind, wouldn't a pertial solution also be to become less tolerant of such pushy advertisement in such places (say on Reddit), even if they are done by honest users?
When it's obvious that entire posts and users are fake, and knowing that product pages on Amazon (which are also sometimes fake) can change what product they list for sale, and since it is known that upvotes/likes/shares are openly for sale, is it really such a stretch to assume that all "recommendations" are as fake as the original question also likely is, until we have evidence to the contrary?
There's a lot of people downplaying the importance of genuine online comments but the reality is that (outside of the bubbles lives in by many HN users) millions upon millions of people are meaningfully participating and forming their viewpoints based on them.
I suspect even the 'well I never trust online comments!' crowd here is not as immune to propaganda as they'd like to believe themselves to be
On Reddit I’m seeing a ton of what seems like engagement or karma spam that seem LLM generated.
It will be a story or question with just enough hints at personal drama and non specifics to engage the community. The stories always seem like a mishmash of past popular posts.
They’re usually posted by brand new accounts that rarely if ever post a comment.
Some subs seem relatively free of them, others inundated with them.
Doesn’t it seem like LLMs can assist with moderation rather than making it harder?
I’m not sure exactly why we are still waiting for the obviously possible ad-hominem and sunk cost fallacy detectors, etc. For the first time we now have the ability to actually build a threaded comment system that (tries to) insist on rational and on topic discussion. Maybe part of that is that we haven’t actually made the leap yet to wanting to censor non contributing but still-human “ contributers” in addition to spam. I guess shit posting is still part of the “real” attention economy and important for engagement.
The apparently on topic but subtly wrong stuff is certainly annoying and in the case of vaguely relevant and not obviously commercial misinformation or misapprehension, I’m not sure how to tell humans from bots. But otoh you wouldn’t actually need that level of sophistication to clean up the cesspool of most YouTube or twitter threads.
That would presume that the moderation knows the truth, that a single truth even exists and that the moderation itself is unbiased.
It would also presume that an LLM knows the truth, which it does not. Even in technical and mathematical matters it fails.
I do not think an LLM can even accurately detect ad-hominem arguments. Is "you launched a scam coin scheme in the first days of your presidency and therefore I don't trust you on other issues" an ad-hominem or an application of probability theory?
Suppose you’re right, then any LLM can still label that as hostile or confrontational. implying that we at least now have the ability to try to filter threads on a simple axis like “arguing” vs “information” vs “anecdote” and in other dimensions much more sophisticated than classic sentiment analysis.
We might struggle to differentiate information vs disinformation, sure, but the above mentioned new super powers are still kind of remarkable, and easily accessible. And yet that “information only please” button is still missing and we are smashing simple up/down votes like cavemen
Actually when you think about even classic sentiment analysis capabilities it really shows how monstrous and insidious algorithmic feeds are.. most platforms just don’t want to surrender any control to users at all, even when we have the technology.
> "Doesn’t it seem like LLMs can assist with moderation rather than making it harder?"
The moderators will need to pay for LLM service to solve a problem created by malicious actors who are paying for LLM service also? No wonder the LLM providers have sky-high valuations.
Compute providers are gonna get paid, yeah. We can hope though that there’s something asymmetric in the required expense for good guys va bad guys though. For example “subtly embed an ad for X while you pretend to reply to Y” does seem like a harder problem you need a cloud model for. TFA mentioned crypto blog spam, which could easily be detected with keywords and local LLM, or no LLM at all
Hey, this is part of my thesis and what I’m working towards figuring out.
People already working on LLMs to assist with content moderation (COPE). Their model can apply a given policy (eg harassment policy,) to a piece of content and judge if it matches the criteria. So the tooling will be made, one way or another.
My support for the thesis is also driven based on how dark the prognostications are.
We won’t be able to distinguish between humans and bots, or even facts soon. The only things which will remain relatively stable are human wants / values and rules / norms.
Bots that encourage pro social behavior, norms and more, are definitely needed just as the natural survival tools we will need.
The author wrote another blog post "Geoblocking the UK with Debian & Nginx"[0]. It's a short tutorial to do exactly as the title says, so it looks like the author did apply this configuration and intentionally want to geoblock the UK for compliance reasons, or maybe as a statement. The blog post has a link to https://geoblockthe.uk
I really don't envy anyone who has to moderate anything at the moment.
But yeah. The vast majority of user generated content on the big platforms was already very loosely moderated, and was already mostly trash.
The platforms are just going to keep on doing what they always do, which is optimize for engagement. It's not the crappy AI comments I'm worried about, it's the good ones. These things will become much better than humans at generating clickbait, outrage, and generally chewing up people's time and sending their amygdalas into overdrive.
I think we're going to keep getting more of what we have now, only more optimized, and therefore worse for us. As the AIs get good we will evolve an even more useless, ubiquitous, addictive, divisive, ad-revenue-driven attention economy. Unplugging your brain will be harder to do but even more worth doing. Probably most people still will not do it. Getting serious dystopia vibes over all this.
God its bleak in trust and safety/content moderation/fact checking. And I’m not even talking about America - good luck to you lovely weirdos.
One of the answers to “how do we solve this mess” was “climate change”. (Dealing with depressing things does funny things to humans).
One report on cyber security (which had Bruce Schneier as an author) showed that LLMs make hitherto unprofitable phishing targets, profitable.
There’s even a case where an employee didn’t follow their phishing training and clicked on a link, and ended up in a zoom call with their team members, transferring a few million in USD to another account. Except everyone on the call was faked.
This is the stuff on the fraud and cyber crime axis, forget the stuff for mundane social media. We’re at the stage where kids are still posting basic GenAI output after prompting “I think vaccines are bad and need to warn people”. They are going to learn FAST at masking this content. Hoo boy.
Dystopia vibes? It’s like looking into the abyss and seeing the abyss reach out to give you a hug.
This keeps me up at night too. I’d like to stake the position that LLMs are antagonistic to the (beleaguered) idea of an internet.
LLMs increase the burden of effort on users to successfully share information with other humans.
LLMs are already close to indistinguishable from humans in chat; Bots are already better at persuading humans[1]. Suggesting that users who feel ineffective at conveying their ideas online, are better served by having a bot do it for them.
All of this, is effectively putting a fitness function on online interactions, increasing the cognitive effort required for humans to interact or be heard. I dont see this playing out in a healthy manner. The only steady state I can envision is where we assume that we ONLY talk to bots online.
Free speech and the market place of ideas, sees us bouncing ideas off of each other. Our way of refining our thoughts and forcing ourselves to test our ideas. This is the conversation that is meant to be the bedrock of democratic societies.
It does not envisage an environment where the exchange of ideas is into a bot.
Yes yes, this is a sky is falling view - not everyone is going to fall off the deep end, and not everyone is going to use a bot.
In a funny way, LLMs will outcompete average forum critters and trolls for their ecological niches.
[1] (https://arxiv.org/pdf/2505.09662)
> increasing the cognitive effort required for humans to interact or be heard. I dont see this playing out in a healthy manner
We are at the stage where it’s still mostly online but the first ways this will leak into the real world in big ways are easy to guess. Job applications, college applications, loan applications, litigation. The small percentage of people who are savvy and naturally inclined towards being spammy and can afford any relevant fees will soon be responsible for over 80 percent of all traffic, not only drowning out others but also overwhelming services completely.
Fees will increase, then the institutions involved will use more AI to combat AI submissions, etc. Schools/banks/employers will also attempt to respond by networking, so that no one looks at applicants directly any more, they just reject if some other rejected. Other publishing from calls for scientific papers to poetry submissions kind of progresses the same way under the same pressures, but the problem of “flooded with junk” isn’t such a new problem there and the stakes are also a bit lower.
In Peter Watts' Maelstrom (2002) it's ultimately self replicating code that pushes the internet from a brutal and rough and competitive infoscape into something worse & even more rawly aggressive. But the book and it's tattered wasteland of the internet still has such tone setting power for me, set such an image up of an internet after humans: where the competing forces of exploitation have degraded and degraded and degraded the situation, pushing humans out.
Recently revisited on Peter's blog: https://www.rifters.com/crawl/?p=11220
The only way to solve it for decentralized messaging systems is a decentralized system for verification of identities based on chain of trust and use of digital signatures by default. It must be a legal framework supported by technical means. For example, id providers may be given a responsibility to confirm certain assumptions about their clients (is a real human, is adult etc) while keeping their identity confidential. The government and the corporations will know only what this person allows the id provider to disclose (unless there’s a legal basis for more, like a decision of the court to accept a lawsuit or a court order to identify suspect or witness). Id provider can issue an ID card that can be used as authentication factor. As long as a real person can be confirmed behind the nickname or email address, the cost of abuse will be permanent ban on a platform or on a network. Not many people will risk it. Natural candidates for id providers can be notaries.
Yes, I think we'll see the rise of id-verified online communities. As long as all the other members of the community are also id-verified, the risk of abuse (bullying, doxing, etc) is minimized. This wouldn't stop someone from posting AI-generated content, but it would tend to suppress misinformation and spam, which arguably is the real issue. Would people complain about AI-generated content that is genuinely informative or thought-provoking?
Verification does not stop harassment or bullying.
It will not stop misinformation either.
Verification is expensive and hard, and currently completely spoof-able. How will a Reddit community verify an ID? In person?
If Reddit itself verifies IDs, then nations across the world will start asking for those IDs and Reddit will have to furnish them.
11 replies →
Having links in comments has always been problematic.
For myself, I usually link to my own stuff; not because I am interested in promoting it, but as relevant backup/enhancement of what I am writing about. I think that a link to an article that goes into depth, or to a GitHub repo, is better than a rough (and lengthy) summary in a comment. It also gives others the opportunity to verify what I say. I like to stand behind my words.
I suspect that more than a few HN members have written karmabots, and also attackbots.
Next, I'm sure, you'll be telling me you're not a bot, Mr Marshall?
https://news.ycombinator.com/item?id=42353508
Love it!
Thanks!
3 replies →
I recall blogs from over 20 years ago, with blatant comment spam, where the blog author would respond to the comment spam individually as if it was real readers. Most didn't fall for that, but a few clearly didn't understand it.
I'm not sure LLMs deviate from a long term trend of increasing volume of information production. It certainly does break the little bubble we had from the early 1990s until 2022/3 where you could figure out you were talking to a real human based on the sophistication of the conversation. That was nice, as was usenet before spammers.
There is a bigger question of identify here. I believe the mistake is to go the path of: photo ID, voice verification, video verification (all trivially by-passable now.) Take another step further with Altman's eyeball thing, another mistake since a human can always be commandeered by a third party. In the long term do we really care that the person we are talking to is real or an AI model? Most of the conversations generated in the future will be AI. They may not care.
I think what actually matters more is some sort of larger history of identify and ownership, matching to what one wishes to (I see no problem with multiple IDs, nicks, avatars.) What does this identify represent? In a way, proof of work.
Now, when someone makes a comment somewhere, if it is just a throw away spam account there is no value. Sure, the spammers can and will do all of the extra stuff to build a fake identity just to promote some bullshit produce, but that already happens with real humans.
> That was nice, as was usenet before spammers.
Not so sure I'd call it "nice."
I am ashamed to say that I was one of the reasons that it wasn't so "nice."
I think that, ultimately, systems that humans use to interact on the internet will have to ditch anonymity. If people can't cheaply and reliably distinguish human output from LLM output, and people care about only talking to humans, we will need to establish authenticity via other mechanisms. In practice that means PKI or web of trust (or variants/combinations), plus reputation systems.
Nobody wants this, because it's a pain, it hurts privacy (or easily can hurt it) and has other social negatives (cliques forming, people being fake to build their reputation, that episode of Black Mirror, etc.). Anonymity is useful like cash is useful. But if someone invents a machine that can print banknotes that fool 80% of people, eventually cash will go out of circulation.
I think the big question is: How much do most people actually care about distinguishing real and fake comments? It hurts moderators a lot, but most people (myself included) don't see this pain directly and are highly motivated by convenience.
We will ditch anonymity, but for pseudonymity, not eponymity. Meaning someone, somewhere will know who is who and can attest that 1000 usernames are humans, but people will be able to identify with just a username to everyone else, except that one person.
>In practice that means PKI or web of trust (or variants/combinations), plus reputation systems.
Yep, that is the way.
Also LLMs will help us create new languages or dialects from existing languages, with the purpose of distinguishing the inner group of people from the outer group of people, and the outer group of LLMs as well. We are in a language arms race for that particular purpose for thousands of years. Now LLMs are one more reason for the arms race to continue.
If we focus for example in making new languages or dialects which sound better to the ear, LLMs have no ears, it is always humans who will be one step ahead of the machine, providing that the language evolves non stop. If it doesn't evolve all the time, LLMs will have time to catch up. Ears are some of the more advanced machinery on our bodies.
BTW I am making right now a program which takes a book written in Ancient Greek and creates an audiobook, or videobook automatically using Google's text to speech. The same program on Google Translate website.
I think people will be addicted in the future with how new languages sound or can be sung.
Well if I’m in a discussion I’d like to know whether the other participants are actual people or just slopmachines (“AIs”).
I have made it a point to be un-anonymous, for the last few years. If you look at my HN handle, it's easy to see who I am, and to look at my work.
This was not always the case. I used to be a Grade A asshole, and have a lot to atone for.
I also like to make as much of my work open, as I can.
No we won't. Just build your web of trust and leave the rest of us anonymous and alone.
You're just doing the bidding of corporations who want to sell ID online systems for a more authoritarian world.
Those systems also use astroturfing. It was not invented with LLMs.
See my other comment https://news.ycombinator.com/item?id=44130743#44150878 for how this is "bleak" mostly if you were comfortable with your Overton window and censorship.
> leave the rest of us anonymous and alone
No one is trying to take away your right to host or participate in anonymous discussions.
> Those systems also use astroturfing. It was not invented with LLMs.
No one is claiming that LLMs invented astroturfing, only that they have made it considerably more economical.
> You're just doing the bidding of corporations who want to sell ID online systems for a more authoritarian world.
Sure, man. Funny that I mentioned "web of trust" as a potential solution, a fully decentralised system designed by people unhappy with the centralised nature of PKI. I guess I must be working in deep cover for my corporate overlords, cunningly trying to throw you off the scent like that. But you got me!
If you want to continue drinking from a stream that's been becoming increasingly polluted since November 2022, you're welcome to do so. Many other people don't consider this an appealing tradeoff and social systems used by those people are likely to adjust accordingly.
5 replies →
You could have authenticated proofs of human-ness without providing your full identity. There are similar systems today which can prove your age without providing your full identity.
> I think that, ultimately, systems that humans use to interact on the internet will have to ditch anonymity.
Relevant meme video (which watching is in my opinion worth your time):
You ditch anonymity, and you have this cascading chilling effect through the interwebs because you cannot moderate communities against the political head winds of your nations.
Worse, it won’t work. We are already able to create fake human accounts, and it’s not even a contest.
And with LLMs, I can do some truly nefarious shit. I could create articles about some discovery of an unknown tribe in the Amazon, populate some unmanned national Wikipedia version with news articles, and substantiate the credentials of a fake anthropologist, and use that identity to have a bot interact with people.
Heck I am bad at this, so someone is already doing something worse than what I can imagine.
Essentially, we can now cheaply create enough high quality supporting evidence for proof of existence. We can spoof even proof of life photos to the point that account take overs resolution tickets can’t be sure if the selfies are faked. <Holy shit, I just realized this. Will people have to physically go to Meta offices now to recover their accounts???>
Returning to moderation, communities online, and anonymity:
The reason moderation and misinformation has been the target of American Republican Senators is because the janitorial task of reducing the spread of conspiracy theories touched the conduits carrying political powers.
That threat to their narrative production and distribution capability has unleashed a global campaign to target moderation efforts and regulation.
Dumping anonymity requires us to basically jettison ye olde internet.
I kind of wonder if I care if comments are real people and actually probably don’t as long as they’re thought provoking. I actually thought it would be an interesting experiment to make my own walled garden LLM link aggregator, sans all the rage bait.
I mean, I care if meetup.com has real people, and I care if my kids’ schools Facebook group has real people, and other forums where there is an expectation of online/offline coordination, but hacker news? Probably not.
I feel like part of why comments here are thought provoking is because they're grounded in something? It's not quite coordination, but if someone talks about using software at a startup or small company I do assume they're genuine about that, which tells you more about something being practical in the real world.
And use cases like bringing up an issue on HN to get companies to reach out to you and fix it would be much harder with llms taking up the bandwidth probably.
1 reply →
I could understand that position, except that I don't think most LLM generated text are for the purpose of producing thought provoking conversation.
My expectation would be that anyone going through the effort to put a LLM generated comment bot online is doing it for some ulterior motive, typically profit or propaganda.
Given this, I would equate not caring about the provenance of the comment, to not caring if you're being intentionally misinformed for some deceptive purpose.
Agree. Another complicating factor for detection is that I don't personally mind seeing a sliver of self-promotion in a comment/post if I feel it's "earned" by the post being on-topic and insightful overall. If such a comment was posted by an LLM, I think I would actually be fine with that.
The one comment is either a splendid illustration or a great piece of sarcasm.
True. However I'm sure an LLM would be able to filter that one out without problems. /s
In the case that it’s a human posting sarcastically, wouldn’t that be a false positive?
5 replies →
No worries, this won't last long.
Once the algorithms predominantly feed on their own shit the bazillion dollar clown party is over.
This has been debunked (to me) here: https://simonwillison.net/2024/Dec/31/llms-in-2024/#syntheti...
That seems to only say that synthetic data is a larger part of models today than in the past. The newer OpenAI models knowingly hallucinate more. Claude 4 seems great but not a multiplier better. Makes me think the effect of synthetic data is at best a net 0. Still has yet to really be seen though.
Disagreeing with something is not debunking.
Debunked is a bit too strong. He qoutes from phi-4 repor that it is easier for the LLM to digest synthetic data. A bit like feeding broiler chickens other dead chickens.
Maybe one day we will have organic LLMs guaranteed to be fed only human generated content.
1 reply →
[dead]
Even supposing the purported "model collapse" does occur, it doesn't destroy the LLMs we already have -- which are clearly already capable of fooling humans. I don't see the clown party being over, just reaching a stable equilibrium.
Exactly. It logically can't occur, even by the own flawed assumptions of the people that say this. Just freeze all training data at 2024 or keep existing models, the worse case scenario is the models will plateau.
1 reply →
They're also out of date, which makes them useless pretty fast.
Google discovered the only way to ultimately resolve spam is to raise the cost to create it.
For web spam this was HTTPS. For account spam this is phone # 2fa. I think requiring a form of id or payment card is the next step.
Twitter, LinkedIn, and others are following the credit card and id (KYC) way but the issue remains when people start automating interactions, not spam per se but it creates a waste of time since users cannot cope with the triggering of zillions of interactions that cannot be followed by human-time.
So they are going to allow only YT premium subs to post comments?
Because if there's one place where Google didn't solve spam, it's on YT's comments
> Because if there's one place where Google didn't solve spam, it's on YT's comments
I do believe that this problem is very self-inflicted (and perhaps even desired) by YouTube:
- The way the comments on YouTube are structured and ordered makes it very hard to make deep discussions on YouTube
- I think there is also a limit on the comment length on YouTube, which again makes it hard to write longer, sophisticated arguments.
- Videos for which a lot of comments a generated tend to become promoted by YouTube's algorithm. Thus YouTubers encourage viewers to write lots of comments (thus also a lot of low-quality comments), i.e. YouTube incentivizes that videos are "spammed" with comments. The correct solution would be to incentivize few, but high-quality comments (i.e. de-incentivize comments that contribute nothing valuable (i.e. worth your time to read)). This makes it much easier to detect and remove the (real) spam among them.
Maybe, but I mean in general for internet participation.
If you make people pay to comment, content farms will gladly pay.
This doesn’t work in perpetuity. One of the reason why spam is so persistent is that when you ban a spammer, they can just create a new identity and go again. If payment is required then not only do they have to repeatedly pay every time they get banned, they need a new payment card too because you aren’t limited to banning their account – you can ban the payment mechanism they used.
2 replies →
Yes… but there will be less spam and it will be more intelligent because the creator must break even.
Indeed it is https://youtu.be/-gGLvg0n-uY?feature=shared
What if we charged a small toll for comments. We create a web standard where you can precharge an amount to your browser account, then you get charged $0.02 for making a comment. The price could be progressively raised until the spammers stop. The profit could pay for website hosting. This would be affordable for users but prohibitively expensive for spammers.
https://craphound.com/spamsolutions.txt
I seem to remember MS having this idea for email many years ago.
The problem originates from LLM service so the toll needs to be on LLM usage in a way that doesn't harm legitimate users but makes it unprofitable to abuse in bulk.
I'm not that worried because most content and content moderation became heavily homogenized through censorship and political manipulation to the point that a sizeable number of conversations and posts provided very little space for "breakthrough value" or "original value". Of course, if you're part of the Overton window you're just now concerned but if you weren't, you're actually excited to see the disruption.
I do recognize the capabilities to hurt from bots in many spaces and cause cost are a real thing to contend with but the paradigm shift is fascinating. Suddenly people need to question authority (LLM output). Awesome. You should've been doing that all along.
The internet sometimes feels like living in a holographic world, as in "The Invention of Morel" [1].
A recent anecdote: an acquaintance of mine automated parts of his LinkedIn activity. After I liked one of his posts, I received an automatic message asking how I was doing. I recognized that the message didn't match a personal tone, but I replied anyway to catch up. He never responded, highlighting how people are automating the engagement process but can't really keep up with the manual follow-through.
[1] https://en.wikipedia.org/wiki/The_Invention_of_Morel
https://archive.ph/Vyasv
I've got a spouse who works in marketing/communications who has spent the weekend working after hours moderating comments on posts about Pride events, and I was musing with her about this -- the days of comments being a thing at all, are numbered. As a means of getting engagement, it gets increasingly the wrong kind not just because of generative AI automation, but because being an asshole is now considered virtuous by many of our highest leaders and the masses are following.
What's the point in even having comments sections? The CBC here in Canada shut theirs off years ago and frankly the world is better for it. News articles are a swamp of garbage comments, generally.
The future of social engagement online is to go back to smaller, registration-required, moderated forums.
Amused that the third comment is the Tirreno guy continuing to spam his project [0]. Good ol' human spam will never go out of style!
[0]: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
Thank you for mentioning tirreno.
Spam is one of the use cases for tirreno. I'm not sure why you'd call this spamming, as the tool is relevant to the problem.
Since detecting LLMs is a silly end goal, the future of moderation probably needs LLMs too, but to evaluate text and see if it fits into blatant commercial speech. It will ruin places where some kinds of commercial speech is wanted (say, asking for a recommendation on reddit). Still, the mindless recommendation of crypto rugpulls and other similar scams will go away.
I am more concerned about voice alignment efforts, like someone creating over time 10k real-ish accounts attempt to contribute, but are doing so to just abuse upvote features to change perception. Ultimately figuring out what is a real measure of popularity , and what is just a campaign to, say, send people to your play is going to get even harder than it is now
> It will ruin places where some kinds of commercial speech is wanted (say, asking for a recommendation on reddit).
There is also a dependence on the culture. For example, what in the USA would be considered a "recommendation" (such as on Reddit) would often be considered "insanely pushy advertising" in Germany.
With this in mind, wouldn't a pertial solution also be to become less tolerant of such pushy advertisement in such places (say on Reddit), even if they are done by honest users?
When it's obvious that entire posts and users are fake, and knowing that product pages on Amazon (which are also sometimes fake) can change what product they list for sale, and since it is known that upvotes/likes/shares are openly for sale, is it really such a stretch to assume that all "recommendations" are as fake as the original question also likely is, until we have evidence to the contrary?
There's a lot of people downplaying the importance of genuine online comments but the reality is that (outside of the bubbles lives in by many HN users) millions upon millions of people are meaningfully participating and forming their viewpoints based on them.
I suspect even the 'well I never trust online comments!' crowd here is not as immune to propaganda as they'd like to believe themselves to be
On Reddit I’m seeing a ton of what seems like engagement or karma spam that seem LLM generated.
It will be a story or question with just enough hints at personal drama and non specifics to engage the community. The stories always seem like a mishmash of past popular posts.
They’re usually posted by brand new accounts that rarely if ever post a comment.
Some subs seem relatively free of them, others inundated with them.
LLMs do seem like a major issue for spam, does hackernews deal with any of this? I presume yes but how do you deal with it if so.
HN users can make 5 comments every 4 hours, and I presume there's some hurdle in the way of making a lot of users as well.
Also they get downvoted and hidden, alongside controversial but correct opinions.
The person who replied to this saying they have multiple accounts is shadowbanned - so clearly, they don't really have multiple accounts.
[dead]
karma
Karma only sorts by popularity. LLMs will do best at that not worst, especially if you use a GPT rather than one of the RLed ones.
Has never worked solo. Someone just seeds more accounts, matures them, and then sells them to spammers.
This reminds me of the first time someone recommended Mr. Beast’s Feastables™ milk chocolate bars as a comment on one of my posts.
I ended up going to my local Walmart to try one, and boy was it delicious! Sometimes things work out in life.
Doesn’t it seem like LLMs can assist with moderation rather than making it harder?
I’m not sure exactly why we are still waiting for the obviously possible ad-hominem and sunk cost fallacy detectors, etc. For the first time we now have the ability to actually build a threaded comment system that (tries to) insist on rational and on topic discussion. Maybe part of that is that we haven’t actually made the leap yet to wanting to censor non contributing but still-human “ contributers” in addition to spam. I guess shit posting is still part of the “real” attention economy and important for engagement.
The apparently on topic but subtly wrong stuff is certainly annoying and in the case of vaguely relevant and not obviously commercial misinformation or misapprehension, I’m not sure how to tell humans from bots. But otoh you wouldn’t actually need that level of sophistication to clean up the cesspool of most YouTube or twitter threads.
That would presume that the moderation knows the truth, that a single truth even exists and that the moderation itself is unbiased.
It would also presume that an LLM knows the truth, which it does not. Even in technical and mathematical matters it fails.
I do not think an LLM can even accurately detect ad-hominem arguments. Is "you launched a scam coin scheme in the first days of your presidency and therefore I don't trust you on other issues" an ad-hominem or an application of probability theory?
Suppose you’re right, then any LLM can still label that as hostile or confrontational. implying that we at least now have the ability to try to filter threads on a simple axis like “arguing” vs “information” vs “anecdote” and in other dimensions much more sophisticated than classic sentiment analysis.
We might struggle to differentiate information vs disinformation, sure, but the above mentioned new super powers are still kind of remarkable, and easily accessible. And yet that “information only please” button is still missing and we are smashing simple up/down votes like cavemen
Actually when you think about even classic sentiment analysis capabilities it really shows how monstrous and insidious algorithmic feeds are.. most platforms just don’t want to surrender any control to users at all, even when we have the technology.
> "Doesn’t it seem like LLMs can assist with moderation rather than making it harder?"
The moderators will need to pay for LLM service to solve a problem created by malicious actors who are paying for LLM service also? No wonder the LLM providers have sky-high valuations.
Compute providers are gonna get paid, yeah. We can hope though that there’s something asymmetric in the required expense for good guys va bad guys though. For example “subtly embed an ad for X while you pretend to reply to Y” does seem like a harder problem you need a cloud model for. TFA mentioned crypto blog spam, which could easily be detected with keywords and local LLM, or no LLM at all
There’s already a lightweight LLM tool for moderation that doesn’t take much to run.
Hey, this is part of my thesis and what I’m working towards figuring out.
People already working on LLMs to assist with content moderation (COPE). Their model can apply a given policy (eg harassment policy,) to a piece of content and judge if it matches the criteria. So the tooling will be made, one way or another.
My support for the thesis is also driven based on how dark the prognostications are.
We won’t be able to distinguish between humans and bots, or even facts soon. The only things which will remain relatively stable are human wants / values and rules / norms.
Bots that encourage pro social behavior, norms and more, are definitely needed just as the natural survival tools we will need.
There is, as ever, an XKCD for this: https://xkcd.com/810/
We’re now finding that sounding helpful and constructive does not equal being helpful and constructive. I wonder what an updated comic would say.
Except for a scam you can be as constructive and helpful before deploying the scam.
"Unavailable Due to the UK Online Safety Act"
The author wrote another blog post "Geoblocking the UK with Debian & Nginx"[0]. It's a short tutorial to do exactly as the title says, so it looks like the author did apply this configuration and intentionally want to geoblock the UK for compliance reasons, or maybe as a statement. The blog post has a link to https://geoblockthe.uk
[0] https://aphyr.com/posts/379-geoblocking-the-uk-with-debian-n...
https://archive.ph/wfcyv
Use a VPN.
Or just don't bother.
Always has been, welcome to the internet.
See also https://aphyr.com/posts/387-the-future-of-customer-support-i... for more AI slop nonsense.
Comments have always been "bullshitting", and LLM's are a tool to help bullshitters quickly generated additional bullshit.
LLMs are going to reduce the value of bullshit. Look at how it's already decimating the marketing industry!
I just bullshitted those last couple sentences though.
I really don't envy anyone who has to moderate anything at the moment.
But yeah. The vast majority of user generated content on the big platforms was already very loosely moderated, and was already mostly trash.
The platforms are just going to keep on doing what they always do, which is optimize for engagement. It's not the crappy AI comments I'm worried about, it's the good ones. These things will become much better than humans at generating clickbait, outrage, and generally chewing up people's time and sending their amygdalas into overdrive.
I think we're going to keep getting more of what we have now, only more optimized, and therefore worse for us. As the AIs get good we will evolve an even more useless, ubiquitous, addictive, divisive, ad-revenue-driven attention economy. Unplugging your brain will be harder to do but even more worth doing. Probably most people still will not do it. Getting serious dystopia vibes over all this.
God its bleak in trust and safety/content moderation/fact checking. And I’m not even talking about America - good luck to you lovely weirdos.
One of the answers to “how do we solve this mess” was “climate change”. (Dealing with depressing things does funny things to humans).
One report on cyber security (which had Bruce Schneier as an author) showed that LLMs make hitherto unprofitable phishing targets, profitable.
There’s even a case where an employee didn’t follow their phishing training and clicked on a link, and ended up in a zoom call with their team members, transferring a few million in USD to another account. Except everyone on the call was faked.
This is the stuff on the fraud and cyber crime axis, forget the stuff for mundane social media. We’re at the stage where kids are still posting basic GenAI output after prompting “I think vaccines are bad and need to warn people”. They are going to learn FAST at masking this content. Hoo boy.
Dystopia vibes? It’s like looking into the abyss and seeing the abyss reach out to give you a hug.
1 reply →