← Back to context

Comment by fsckboy

12 hours ago

that sounds like one of the worst heuristics I've ever heard, worse than "em-dash=ai" (em-dash equals ai to the illiterate class, who don't know what they are talking about on any subject and who also don't use em-dashes, but literate people do use em-dashes and also know what they are talking about. this is called the Dunning-Em-Dash Effect, where "dunning" refers to the payback of intellectual deficit whereas the illiterate think it's a name)

The em-dash=LLM thing is so crazy. For many years Microsoft Word has AUTOCORRECTED the typing of a single hyphen to the proper syntax for the context -- whether a hyphen, en-dash, or em-dash.

I would wager good money that the proliferation of em-dashes we see in LLM-generated text is due to the fact that there are so many correctly used em-dashes in publicly-available text, as auto-corrected by Word...

  • Which would matter but the entry box in no major browser do was this.

    The HN text area does not insert em-dashes for you and never has. On my phone keyboard it's a very lot deliberate action to add one (symbol mode, long press hyphen, slide my finger over to em-dash).

    The entire point is it's contextual - emdashes where no accomodations make them likely.

    • Is this—not an em-dash? On iOS I generated it by double tapping dash. I think there are more iOS users than AIs, although I could be wrong about that…

    • Yeah, I get that. And I'm not saying the author is wrong, just commenting on that one often-commented-upon phenomenon. If text is being input to the field by copy-paste (from another browser tab) anyway, who's to say it's not (hypothetically) being copied and pasted from the word processor in which it's being written?

The audio artifacts of an AI generated video are a far more reliable heuristic than the presence of a single character in a body of text.

  • Well, its probably lower false positive than en-dash but higher false negative, especially since AI generated video, even when it has audio, may not have AI generated audio. (Generation conditioned on a text prompt, starting image, and audio track is among the common modes for AI video generation.)

Thank you for saving me the time writing this. Nothing screams midwit like "Em-dash = AI". If AI detection was this easy, we wouldn't have the issues we have today.

Of note is theother terrible heuristic I've seen thrown around, where "emojis = AI", and now the "if you use not X, but Y = AI".

  • With the right context both are pretty good actually.

    I think the emoji one is most pronounced in bullet point lists. AI loves to add an emoji to bullet points. I guess they got it from lists in hip GitHub projects.

    The other one is not as strong but if the "not X but Y" is somewhat nonsensical or unnecessary this is very strong indicator it's AI.

    • >I guess they got it from lists in hip GitHub projects.

      I see this way more often on GitHub now than I did before, though.

  • Similarly: "The indication for machine-generated text isn't symbolic. It's structural." I always liked this writing device, but I've seen people label it artificial.

  • Em-dashes are completely innocent. “Not X but Y” is some lame rhetorical device, I’m glad it is catching strays.

No one uses em dashes

  • If nobody used em-dashes, they wouldn’t have featured heavily in the training set for LLMs. It is used somewhat rarely (so e people use it a lot, others not at all) in informal digital prose, but that’s not the same as being entirely unused generally.

  • Microsoft Word automatically converts dashes to em dashes as soon as you hit space at the end of the next word after the dash.

    • That's the only way I know how to get an em dash. That's how I create them. I sometimes have to re-write something to force the "dash space <word> space" sequence in order for Word to create it, and then I copy and paste the em dash into the thing I'm working on.

      5 replies →

  • I do—all the time. Why not?

    I also use en dashes when referring to number ranges, e.g., 1–9

    • I didn't know these fancy dashes existed until I read Knuth's first book on typesetting. So probably 1984. Since then I've used them whenever appropriate.

  • Except for Emily Dickenson, who is an outlier and should not be counted.

    Seriously, she used dashes all the time. Here is a direct copy and paste of the first two stanzas of her poem "Because I count not stop for Death" from the first source I found, https://www.poetryfoundation.org/poems/47652/because-i-could...

      Because I could not stop for Death –
      He kindly stopped for me –
      The Carriage held but just Ourselves –
      And Immortality.
    
      We slowly drove – He knew no haste
      And I had put away
      My labor and my leisure too,
      For His Civility –
    

    Her dashes have been rendered as en dashes in this particular case rather than em dashes, but unless you're a typography enthusiast you might not notice the difference (I certainly didn't and thought they were em dashes at first). I would bet if I hunted I would find some places where her poems have been transcribed with em dashes. (It's what I would have typed if I were transcribing them).

  • Except for highly literate people, and people who care about typography.

    Think about it— the robots didn’t invent the em-dash. They’re copying it from somewhere.

    • My impression of people that say they’re em dash users is that they’re laundering their dunning kruger through AI.

  • Tell me you never worked with LaTeX and an university style guide without telling me you never worked with LaTeX and an university style guide.