Kitten TTS: 25MB CPU-Only, Open-Source Voice Model

7 days ago (algogist.com)

Very cool model, but the post is a caricature of AI writing. "Okay, let's get into the nitty-gritty. What makes this little beast tick? These aren't just bullet points on a GitHub README; these are the specs that will fundamentally redefine what you thought was possible with local AI." Sure.

  • Everybody always thinks everything is AI. AI learned from consuming writing.

    This is a ouroboros that will continue.

    (Not saying this is or isn't, simply that these claims are rampant on a huge number of posts and seem to be growing.)

    • This is strictly true but not correct. LLMs were trained on human-written text, but they were post-trained to generate text in a particular style. And that style does have some common patterns.

      15 replies →

  • The writing style we associate with AI is the 2010's blogging style that AI learned from... So it definitely could have been written by a person.

    • No it isn't, it's something new born from ingesting that stuff... That's exactly why a lot of us can detect it from a mile away.

      No human comments on meta formatting like that outside the deepest trenches of Apple/FB corporate stuff.

      3 replies →

  • I think it’s fair enough to just say that the writing is cringe, AI or not.

  • Indeed the blurb is absurd and very off-putting. It's not a big deal that "It clocks in at under 25MB with just 15 million parameters", because text to speech is a long-solved problem, in fact the Texas Speak and Spell from 1978 (half a century ago FFS) solved it, probably with a good deal less than 25MB.

    • Speak and Spell was a toy. I loved it as a kid in the eighties. But it was very limited and sounded terrible.