Comment by treyd

8 days ago

This is strictly true but not correct. LLMs were trained on human-written text, but they were post-trained to generate text in a particular style. And that style does have some common patterns.

So are you saying all LLMs were post-trained in that style then?

Because, well, there's a huge number of models. Are they all, as they say, "in cahoots"? (working together, clandestinely)

  • Examples of LLM-style text: short & punchy sentences, negative parallelism ("not just X, it's Y"), bullet points especially with emojis and bolded text. Overuse of em-dash.

    This is a good list: https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

    It's one thing to observe "LLM-generated writing all looks the same". Whether the LLMs were all post-trained the same way is a different question.

    I don't agree "everyone says everything is AI". Do you have examples where a consensus of people are accusing something of being AI generated, where it lacks those indicators?

    • I consistently get accused of AI writing, but I’m not really sure why. I use spellcheck, that’s about it. Although I am a fan of an appropriately used em-dash, I don’t really follow the other patterns. I also find that people say that as a form of character assassination though, literally denying your humanity lol.

    • Many of those rules are kind of hazy though. Curly Quotes, em-dash, etc are also signs of using MS Word for writing for examples.

    • Just reading through posts on here about various blogs/posts/opinion pieces there always seems to be a handful of people that jumps to "this is AI". And maybe it is! But the driving force behind this seems not to be to identify that something is AI, but because they spite it so (AI writing), to quickly rule out caring about the material if identified as AI slop.

      The problem I see this leading to is plenty of legitimate written things getting thrown away because somebodys online exposure bubbles don't end up including a lot of Medium or Tumblr or a certain Discord or whatever bubble where _huge_ groups of people actually are writing in whatever $STYLE is being identified by the reader and commenter as AI. Which then, because of their post, also gets other people to not even look.

      It seems like a disaster, frankly.

      2 replies →

  • There was a time everyone trained their models with ChatGPT output. You can still find open source models that tell you they're ChatGPT if you ask.

  • Seems like many train on the output of other models for post-training and catch some kind of cooties.

  • if the people who develop and release these models were all optimizing for the same goals, they could converge on strategies or behaviors, without coordinating.

I'm one of the unlucky ones who has coincidentally trained myself over the past fifteen years to write in the style that is now largely recognized to be the ChatGPT style— bolded lists, clear section breakdowns with intro and concluding sentences, correct and liberal use of semicolons and em-dashes. The only parts of it I don't do are litter my text with random emojis or directly address the reader with simpering praise.

  • Sounds like someone that shoots for simple but effective communication, to me.

    • I mean, that has always been my intention with it— particularly in the context of something like a ticket or design doc where it's critical that other busy people be able to quickly get a high level overview and then scan for the bits that are most relevant or of interest to themselves.

      It's just ironic that I've now been asked if I was using AI to write (or punch up) content that I've produced in this style when I most certainly was not.