Comment by websiteapi

2 months ago

I wonder if it works for speaker diarization out of the box. I've found that open source speaker diarization that doesn't require a lot of tweaking is basically non-existent.

6 comments

websiteapi

hamza_q_ 2 months ago

Yeah I was frustrated by slow and hard to use OSS diarization too; recently released a library to address that, check it out: https://github.com/narcotic-sh/senko

Also https://zanshin.sh, if you'd like speaker diarization when watching YouTube videos

noman-land 2 months ago
Hey, thanks for this. Been trying it out and it's very fast but seems to hear more speakers than are in the audio. I didn't see a way to tweak speaker similarity settings or merge speakers in some way. Any advice?
- hamza_q_ 2 months ago
  
  Thanks for checking it out!
  Yeah unfortunately, since the diarization is acoustic features based, it really does require high recorded voice fidelity/quality to get the best results. However, I just added another knob to the Diarizer class called mer_cos, which controls the speaker merging threshold. The default is 0.875, so perhaps try lowering to 0.8. That should help.
  I'll also get around to adding a oracle/min/max speakers feature at some point, for cases where you know the exact number of speakers ahead of time, or wanna set upper/lower bounds. Gotten busy with another project, so haven't done it yet. PR's welcome though! haha
  
  2 replies →
websiteapi 2 months ago

looks interesting. will check it out.