Comment by hamza_q_
1 month ago
Yeah I was frustrated by slow and hard to use OSS diarization too; recently released a library to address that, check it out: https://github.com/narcotic-sh/senko
Also https://zanshin.sh, if you'd like speaker diarization when watching YouTube videos
Hey, thanks for this. Been trying it out and it's very fast but seems to hear more speakers than are in the audio. I didn't see a way to tweak speaker similarity settings or merge speakers in some way. Any advice?
Thanks for checking it out!
Yeah unfortunately, since the diarization is acoustic features based, it really does require high recorded voice fidelity/quality to get the best results. However, I just added another knob to the Diarizer class called mer_cos, which controls the speaker merging threshold. The default is 0.875, so perhaps try lowering to 0.8. That should help.
I'll also get around to adding a oracle/min/max speakers feature at some point, for cases where you know the exact number of speakers ahead of time, or wanna set upper/lower bounds. Gotten busy with another project, so haven't done it yet. PR's welcome though! haha
Thanks, `mer_cos` definitely gets me closer. I appreciate that. Yeah, I was thinking providing a param for the expected number of speakers would be nice. I'll check out the codebase and see if that's something I can contribute :).
1 reply →
looks interesting. will check it out.