← Back to context

Comment by fastball

4 days ago

The weights are indeed open (both accessible and licensing-wise): you don't need to put that in square quotes. Training code is not. You can fine-tune the weights yourself with your own training code. Saying that isn't open is like saying ffmpeg isn't open because it doesn't do everything I need it to do and I have to wrap it with own code to achieve my goals.

It really weird to say ByteDance’s release is “more open” when the WaveVAE encoder isn't released at all, only the decoder, so new voices require submitting your sample to a public GDrive folder and getting extracted latents back through another public GDrive folder.

Machine learning assets are not binary "open" or "closed". There is a continuum of openness.

To make a really poor analogy, this repo is like a version of Linux that you can't cross-compile or port.

To make another really poor (but fitting) analogy, this is like an "open core" SaaS platform that you know you'll never be able to run the features that matter on your own.

This repo scores really low on the "openness" continuum. In this case, you're very limited in what you can do with Chatterbox TTS. You certainly can't improve it or fit it to your data.

> You can fine-tune the weights yourself with your own training code.

This will never be built by anyone, and they know that. If it could be, they'd provide it themselves.

If you're considering Chatterbox TTS, just use MegaTTS3 [1] instead. It's better by all accounts.

[1] https://github.com/bytedance/MegaTTS3

  • > This will never be built by anyone, and they know that. If it could be, they'd provide it themselves.

    Community fine-tuning code has been developed in the past for open-weights models without public first-party training code.

  • Why can't you improve it or fit it to your data?

    This can be cross-compiled/ported in the Linux analogy. The Linux analogy would be more like: a kernel dev wrote code for some part of the Linux kernel using JetBrains' CLion. He used features of CLion that made this process much easer than if he had written the code using `nano`. By your logic, the resulting kernel code is not "open" because the tooling used to create it is not open. This is, of course, nonsense.

    I agree that the project as a whole is less open than it could be, but the weights are indeed as open as they can be, no scare quotes required.

    • I really don't think your analogy fits the absurdity of lacking the tooling. It's more like you have to decompile an N64 cartridge ROM and don't have the tools. But I don't want to play that game.

      I'll up the ante. I'll bet you money that nobody forks this and adds fine tuning for at least a year.

      3 replies →