Comment by echelon
3 days ago
I really don't think your analogy fits the absurdity of lacking the tooling. It's more like you have to decompile an N64 cartridge ROM and don't have the tools. But I don't want to play that game.
I'll up the ante. I'll bet you money that nobody forks this and adds fine tuning for at least a year.
Someone already did: https://github.com/stlohrey/chatterbox-finetuning
And someone else fine-tuned it for German: https://huggingface.co/SebastianBodza/Kartoffelbox-v0.1
You're supposed to wait to post this until I agree to the bet ;)
I'm totally humbled by this.
I haven't seen this level of involvement for a lot of the models I'm using, including several text to speech models.
The rapidity of this is also quite shocking. I don't think Resemble anticipated this either, given their wording on the aforementioned ticket.
There's probably a lot more work to do to ensure this works, adjusting learning rates, batching, etc., but it's all clearly being put into place and given attention. Even if this model has some finicky fine tuning behaviors, with this kind of willpower it'll be quickly overcome.
I suppose I owe you, haha.