← Back to context

Comment by Antibabelic

9 hours ago

I found the page Wikipedia:Signs of AI Writing[1] very interesting and informative. It goes into a lot more detail than the typical "em-dashes" heuristic.

[1]: https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

An interesting observation from that page:

"Thus the highly specific "inventor of the first train-coupling device" might become "a revolutionary titan of industry." It is like shouting louder and louder that a portrait shows a uniquely important person, while the portrait itself is fading from a sharp photograph into a blurry, generic sketch. The subject becomes simultaneously less specific and more exaggerated."

  • I think that's a general guideline to identify "propaganda", regardless of the source. I've seen people in person write such statements with their own hands/fingers, and I know many people who speak like that (shockingly, most of them are in management).

    Lots of those points seems to get into the same idea which seems like a good balance. It's the language itself that is problematic, not how the text itself came to be, so makes sense to 100% target what language the text is.

    Hopefully those guidelines make all text on Wikipedia better, not just LLM produced ones, because they seem like generally good guidelines even outside the context of LLMs.

  • To me that seems like we're mistaken in mixing fiction and non-fiction in AI training data. The "a revolutionary titan of industry" makes sense if you where reading a novel where something like 90% of a book is describing the people, locations, objects and circumstances. The author of a novel would want to use exaggeration and more colourful words to underscore a uniquely important person, but "this week in trains" would probably de-emphasize the person and focus on the train-coupler.

    • fiction is part of our shared language and culture. we communicate by making analogies, and our stories, especially our old ones, provide a rich basis to draw upon. neither a person nor an llm can be fluent users of human language without spending time learning from both fiction and non-fiction.

  • The funny thing about this is that this also appears in bad human writing. We would be better off if vague statements like this were eliminated altogether, or replaced with less fantastical but verifiable statements. If this means that nothing of the article is left then we have killed two birds with one stone.

    • What do you think the LLMs were trained on? 90% of everything is crap, and they trained on everything.

  • That's actually putting into words, what I couldn't, but felt similar. Spectacular quote

    • I'm thinking quite a bit about this at the moment in the context of foundational models and their inherent (?) regression to the mean.

      Recently there has been a big push into geospatial foundation models (e.g. Google AlphaEarth, IBM Terramind, Clay).

      These take in vast amounts of satellite data and with the usual Autoencoder architecture try and build embedding spaces which contain meaningful semantic features.

      The issue at the moment is that in the benchmark suites (https://github.com/VMarsocci/pangaea-bench), only a few of these foundation models have recently started to surpass the basic U-Net in some of the tasks.

      There's also an observation by one of the authors of the Major-TOM model, which also provides satellite input data to train models, that the scale rule does not seem to hold for geospatial foundation models, in that more data does not seem to result in better models.

      My (completely unsupported) theory on why that is, is that unlike writing or coding, in satellite data you are often looking for the needle in the haystack. You do not want what has been done thousands of times before and was proven to work. Segmenting out forests and water? Sure, easy. These models have seen millions of examples of forests and water. But most often we are interested in things that are much, much rarer. Flooding, Wildfire, Earthquakes, Landslides, Destroyed buildings, new Airstrips in the Amazon, etc. etc.. But as I see it, the currently used frameworks do not support that very well.

      But I'd be curious how others see this, who might be more knowledgeable in the area.

  • Outstanding. Praise wikipedia, despite any shortcomings wow, isn't it such a breath of fresh air in the world of 2026.

This is so much detailed and everyone who is sick of reading generated text should read this.

I had a bad experience at a shitty airport, went to google maps to leave a bad review, and found that its rating was 4.7 by many thousand people. Knowing that airport is run by corrupt government, I started reading those super positive reviews and the other older reviews by them. People who could barely manage few coherent sentences of English are now writing multiple paragraphs about history and vital importance of that airport in that region.

Reading first section "Undue emphasis on significance" those fake reviews is all I can think of.

Ironically this is a goldmine for AI labs and AI writer startups to do RL and fine-tuning.

  • In the case of those big 'foundation models': Fine-tune for whom and how? I doubt it is possible to fine-tune things like this in a way that satisfies all audiences and training set instances. Much of this is probably due to the training set itself containing a lot of propaganda (advertising) or just bad style.

    • I'm pretty sure Mistral is doing fine tuning for their enterprise clients. OpenAI and Anthropic are probably not?

      I'm more thinking about startups for fine-tuning.

  • That's not quite how that works though. It can for example be possible that fine-tuning a model to avoid the styles described in the article cause the LLM to stop functionaing as well as it can. It might just be an artefact of the architecture itself that to be effective it has to follow these rules. If it was as easy as just providing data and the LLM would then 'encode' that as a rule, we would advance much quicker than we currently are.