← Back to context

Comment by nicktikhonov

18 hours ago

If you read the post, you'll see that I used Deepgram's Flux. It also does endpointing and is a higher-level abstraction than VAD.

Sorry, I commented too soon. Did you also try Soniox? Why did you decide to use Deepgram's Flux (English only)?

  • I didn't try Soniox, but I made a note to check it out! I chose Flux because I was already using Deepgram for STT and just happened to discover it when I was doing research. It would definitely be a good follow-up to try out all the different endpointing solutions to see what would shave off additional latency and feel most natural.

    Another good follow-up would be to try PersonaPlex, Nvidia's new model that would completely replace this architecture with a single model that does everything:

    https://research.nvidia.com/labs/adlr/personaplex/

I second Soniox as well, as a user. It really does do way better than Deepgram and others. If your app architecture is good enough then maybe replacing providers shouldn't be too hard.