Comment by Fr0styMatt88

11 hours ago

I find the sound is a dead giveaway for most AI videos — the voices all sound like a low bitrate MP3.

Which will eventually get worked around and can easily be masked by just having a backing track.

that sounds like one of the worst heuristics I've ever heard, worse than "em-dash=ai" (em-dash equals ai to the illiterate class, who don't know what they are talking about on any subject and who also don't use em-dashes, but literate people do use em-dashes and also know what they are talking about. this is called the Dunning-Em-Dash Effect, where "dunning" refers to the payback of intellectual deficit whereas the illiterate think it's a name)

  • The em-dash=LLM thing is so crazy. For many years Microsoft Word has AUTOCORRECTED the typing of a single hyphen to the proper syntax for the context -- whether a hyphen, en-dash, or em-dash.

    I would wager good money that the proliferation of em-dashes we see in LLM-generated text is due to the fact that there are so many correctly used em-dashes in publicly-available text, as auto-corrected by Word...

    • Which would matter but the entry box in no major browser do was this.

      The HN text area does not insert em-dashes for you and never has. On my phone keyboard it's a very lot deliberate action to add one (symbol mode, long press hyphen, slide my finger over to em-dash).

      The entire point is it's contextual - emdashes where no accomodations make them likely.

      2 replies →

  • The audio artifacts of an AI generated video are a far more reliable heuristic than the presence of a single character in a body of text.

    • Well, its probably lower false positive than en-dash but higher false negative, especially since AI generated video, even when it has audio, may not have AI generated audio. (Generation conditioned on a text prompt, starting image, and audio track is among the common modes for AI video generation.)

  • Thank you for saving me the time writing this. Nothing screams midwit like "Em-dash = AI". If AI detection was this easy, we wouldn't have the issues we have today.

  • Of note is theother terrible heuristic I've seen thrown around, where "emojis = AI", and now the "if you use not X, but Y = AI".

    • With the right context both are pretty good actually.

      I think the emoji one is most pronounced in bullet point lists. AI loves to add an emoji to bullet points. I guess they got it from lists in hip GitHub projects.

      The other one is not as strong but if the "not X but Y" is somewhat nonsensical or unnecessary this is very strong indicator it's AI.

      1 reply →

    • Similarly: "The indication for machine-generated text isn't symbolic. It's structural." I always liked this writing device, but I've seen people label it artificial.

    • Em-dashes are completely innocent. “Not X but Y” is some lame rhetorical device, I’m glad it is catching strays.

  • No one uses em dashes

    • If nobody used em-dashes, they wouldn’t have featured heavily in the training set for LLMs. It is used somewhat rarely (so e people use it a lot, others not at all) in informal digital prose, but that’s not the same as being entirely unused generally.

    • Except for Emily Dickenson, who is an outlier and should not be counted.

      Seriously, she used dashes all the time. Here is a direct copy and paste of the first two stanzas of her poem "Because I count not stop for Death" from the first source I found, https://www.poetryfoundation.org/poems/47652/because-i-could...

        Because I could not stop for Death –
        He kindly stopped for me –
        The Carriage held but just Ourselves –
        And Immortality.
      
        We slowly drove – He knew no haste
        And I had put away
        My labor and my leisure too,
        For His Civility –
      

      Her dashes have been rendered as en dashes in this particular case rather than em dashes, but unless you're a typography enthusiast you might not notice the difference (I certainly didn't and thought they were em dashes at first). I would bet if I hunted I would find some places where her poems have been transcribed with em dashes. (It's what I would have typed if I were transcribing them).

    • Except for highly literate people, and people who care about typography.

      Think about it— the robots didn’t invent the em-dash. They’re copying it from somewhere.

      1 reply →

    • Tell me you never worked with LaTeX and an university style guide without telling me you never worked with LaTeX and an university style guide.

      1 reply →