Comment by MutedEstate45

6 months ago

The headline feature isn’t the 25 MB footprint alone. It’s that KittenTTS is Apache-2.0. That combo means you can embed a fully offline voice in Pi Zero-class hardware or even battery-powered toys without worrying about GPUs, cloud calls, or restrictive licenses. In one stroke it turns voice everywhere from a hardware/licensing problem into a packaging problem. Quality tweaks can come later; unlocking that deployment tier is the real game-changer.

47 comments

MutedEstate45

rohan_joshi 6 months ago

yeah, we are super excited to build tiny ai models that are super high quality. local voice interfaces are inevitable and we want to power those in the future. btw, this model is just a preview, and the full release next week will be of much higher quality, along w another ~80M model ;)

woadwarrior01 6 months ago

> It’s that KittenTTS is Apache-2.0

Have you seen the code[1] in the repo? It uses phonemizer[2] which is GPL-3.0 licensed. In its current state, it's effectively GPL licensed.

[1]: https://github.com/KittenML/KittenTTS/blob/main/kittentts/on...

[2]: https://github.com/bootphon/phonemizer

Edit: It looks like I replied to an LLM generated comment.

oezi 6 months ago

The issue is even bigger: phonemizer is using espeak-ng, which isn't very good at turning graphemes into phonemes. In other TTS which rely on phonemes (e.g. Zonos) it turned out to be one of the key issues which cause bad generations.
And it isn't something you can fix, because the model was trained on bad phonemes (everyone uses Whisper + then phonemizes the text transcript).
jacereda 6 months ago
https://github.com/KittenML/KittenTTS/issues/17
- dspillett 6 months ago
  
  > IANAL, but AFAICS this leaves 2 options, switching the license or removing that dependency.
  There is a third option: asking the project for an exception.
  Though that is unlikely to be granted¹ leaving you back with just the other two options.
  And of course a forth choice: just ignore the license. This is the option taken by companies like Onyx, whose products I might otherwise be interested in…
  ----
  [1] Those of us who pick GPL3 or AGPL generally do so to keep things definite and an exception would muddy the waters, also it might not even be possible if the project has many maintainers as relicensing would require agreement from all who have provided code that is in the current release. Furthermore, if it has inherited the license from one of its dependencies, an exception is even less practical.
  
  6 replies →
- ape4 6 months ago
  
  Once the license issues are resolved it would nice if you could install it on a distro with the normal package manager.
gorgoiler 6 months ago
This would only apply if they were distributing the GPL licensed code alongside their own code.
If my MIT-licensed one-line Python library has this line of code…
run([“bash”, “-c”, “echo hello”])
…I’m not suddenly subject to bash’s licensing. For anyone wanting to run my stuff though, they’re going to need to make sure they themselves have bash installed.
(But, to argue against my own point, if an OS vendor ships my library alongside a copy of bash, do they have to now relicense my library as GPL?)
- ApolloFortyNine 6 months ago
  
  The FSF thinks it counts as a derivative work and you have to use the LGPL to allow linking.
  However, this has never actually been proven in court, and there's many good arguments that linking doesn't count as a derivative work.
  Old post by a lawyer someone else found (version 3 wouldn't affect this) [1]
  For me personally I don't really understand how, if dynamic linking was viral, using linux to run code isn't viral. Surely at some level what linux does to run your code calls GPLed code.
  It doesn't really matter though, since the FSF stance is enough to scare companies from not using it, and any individual is highly unlikely to be sued.
  [1] https://www.linuxjournal.com/article/6366
  
  2 replies →
- r4indeer 6 months ago
  
  > This would only apply if they were distributing the GPL licensed code alongside their own code.
  As far as I understand the FSF's interpretation of their license, that's not true. Even if you only dynamically link to GPL-licensed code, you create a combined work which has to be licensed, as a whole, under the GPL.
  I don't believe that this extends to calling an external program via its CLI, but that's not what the code in question seems to be doing.
  (This is not an endorsement, but merely my understanding on how the GPL is supposed to work.)
- woadwarrior01 6 months ago
  
  This is a false analogy. It's quite straightforward.
  Running bash (via exec()/fork()/spawn()/etc) isn't the same as (statically or dynamically) linking with its codebase. If your MIT-licensed one-liner links to code that's GPL licensed, then it gets infected by the GPL license.
  
  3 replies →
- calvinmorrison 6 months ago
  
  GPL is for boomers at this point. Floppy disks? Distribution? You can use a tool but you cant change it? A DLL call means you need to redistribute your code but forking doesn't?
  Sillyness
  
  2 replies →
Hackbraten 6 months ago
Given that the FSF considers Apache-2.0 to be compatible with GPL-3.0 [0], how could the fact that phonemizer is GPL-3.0 possibly be an issue?
[0]: https://www.gnu.org/licenses/license-list.html#apache2
- adastra22 6 months ago
  
  Compatible means they can be linked together, BUT the result is GPL-3.
  
  1 reply →
keyKeeper 6 months ago
Okay, what's stopping you from feeding the code into an LLM and re-write it and make it yours? You can even add extra steps like make it analyze the code block by block then supervise it as it is rewriting it. Bam. AI age IP freedom.
Morals may stop you but other than that? IMHO all open source code is public domain code if anyone is willing to spend some AI tokens.
- Twirrim 6 months ago
  
  That would be a derivative work, and still be subject to the license terms and conditions, at best.
  There are standard ways to approach this called clean room engineering.
  https://en.m.wikipedia.org/wiki/Clean-room_design
  One person reads the code and produces a detailed technical specification. Someone reviews it to ensure that there is nothing in there that could be classified as copyrighted material, then a third person (who has never seen the original code) implements the spec.
  You could use an LLM at both stages, but you'd have to be able to prove that the LLM that does the implementation had no prior knowledge of the code in question... Which given how LLMs have been trained seems to me to be very dubious territory for now until that legal situation gets resolved.
- K0balt 6 months ago
  
  AI is useful in Chinese walling code, but it’s not as easy as you make it sound. To stay out of legal trouble, you probably should refactor the code into a different language, then back into the target language. In the end, it turns into a process of being forced to understand the codebase and supervising its rewriting. I’ve translated libraries into another language using LLMs, I’d say that process was 1/2 the labor of writing it myself. So in the end, going 2 ways, you may as well rewrite the code yourself… but working with the LLM will make you familiar with the subject matter so you -could- rewrite the code, so I guess you could think of it as a sort of buggy tutorial process?
  
  2 replies →
- woadwarrior01 6 months ago
  
  Tell me you haven't used LLMs on large, non-trivial codebases without telling me... :)
  
  2 replies →

defanor 6 months ago

A Festival's English model, festvox-kallpc16k, is about 6 MB, and it is a large model; festvox-kallpc8k is about 3.5 MB.

eSpeak NG's data files take about 12 MB (multi-lingual).

I guess this one may generate more natural-sounding speech, but older or lower-end computers were capable of decent speech synthesis previously as well.

Joel_Mckay 6 months ago
Custom voices could be added, but the speed was more important to some users.
$ ls -lh /usr/bin/flite
Listed as 27K last I checked.
I recall some Blind users were able to decode Gordon 8-bit dialogue at speeds most people found incomprehensible. =3
- anthk 6 months ago
  
  I'm not blind but spoken English it's far more difficult to grasp than written one (I'm a non-native speaker), and Flite runs on n270 netbooks at crazy speeds with really good enough voices.

pjc50 6 months ago

> KittenTTS is Apache-2.0

What about the training data? Is everyone 100% confident that models are not a derived work of the training inputs now, even if they can reproduce input exactly?

entropie 6 months ago

I play around with a nvidia jetson orin nano super right now and its actually pretty usuable with gemma3:4b and quite fast - even image processing is done in like 10-20 seconds but this is with GPU support. When something is not working and ollama is not using the GPU this calls take ages because the cpu is just bad.

Iam curious how fast this is with CPU only.

phh 6 months ago

It depends on espeak-ng which is GPLv3

ethan_smith 6 months ago

This opens up voice interfaces for medical devices, offline language learning tools, and accessibility gadgets for the visually impaired - all markets where cloud dependency and proprietary licenses were showstoppers.

Narishma 6 months ago

But Pi Zero has a GPU, so why not make use of it?

a96 6 months ago

Because then you're stuck on that device only.

CyberDildonics 6 months ago

The github just has a few KB of python that looks like an install script. How is this used from C++ ?