Kitten TTS: 25MB CPU-Only, Open-Source Voice Model

7 days ago (algogist.com)

35 comments

jainilprajapati

We've moved the (relevant) comments to https://news.ycombinator.com/item?id=44807868, which was posted by the project creators.

I've re-upped that thread to the same position the previous discussion (this one) was at.

divamgupta 7 days ago

Thanks for posting about our project in HN! I am one of the creators of KittenTTS

Here is the link to our repo: https://github.com/KittenML/KittenTTS

jainilprajapati 6 days ago

<3

colonCapitalDee 7 days ago

Very cool model, but the post is a caricature of AI writing. "Okay, let's get into the nitty-gritty. What makes this little beast tick? These aren't just bullet points on a GitHub README; these are the specs that will fundamentally redefine what you thought was possible with local AI." Sure.

esseph 7 days ago
Everybody always thinks everything is AI. AI learned from consuming writing.
This is a ouroboros that will continue.
(Not saying this is or isn't, simply that these claims are rampant on a huge number of posts and seem to be growing.)
- treyd 7 days ago
  
  This is strictly true but not correct. LLMs were trained on human-written text, but they were post-trained to generate text in a particular style. And that style does have some common patterns.
  
  15 replies →
jainilprajapati 7 days ago

This is HOW I WRITE man yes I agree I take LITTLE help Of AI
dismalaf 7 days ago
The writing style we associate with AI is the 2010's blogging style that AI learned from... So it definitely could have been written by a person.
- hildolfr 7 days ago
  
  No it isn't, it's something new born from ingesting that stuff... That's exactly why a lot of us can detect it from a mile away.
  No human comments on meta formatting like that outside the deepest trenches of Apple/FB corporate stuff.
  
  3 replies →
namuol 7 days ago

I think it’s fair enough to just say that the writing is cringe, AI or not.
anonym29 7 days ago
[flagged]
- tomhow 7 days ago
  
  Please don't post snarky comments attacking other users like this on HN, no matter what you're replying to. It's not what this site is for, and destroys what it is for.
  If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.
  
  2 replies →
PontifexMinimus 7 days ago
Indeed the blurb is absurd and very off-putting. It's not a big deal that "It clocks in at under 25MB with just 15 million parameters", because text to speech is a long-solved problem, in fact the Texas Speak and Spell from 1978 (half a century ago FFS) solved it, probably with a good deal less than 25MB.
- paulryanrogers 7 days ago
  
  Speak and Spell was a toy. I loved it as a kid in the eighties. But it was very limited and sounded terrible.