Comment by mpaepper
21 hours ago
You should look into the new Nvidia model: https://research.nvidia.com/labs/adlr/personaplex/
It has dual channel input / output and a very permissible license
21 hours ago
You should look into the new Nvidia model: https://research.nvidia.com/labs/adlr/personaplex/
It has dual channel input / output and a very permissible license
Oh man that space emergency example had me rolling
Ha --
and the "Customer Service - Banking" scenario claims that it demos "accent control" and the prompt gives the agent a definitely non-indian name, yet the agents sounds 100% Indian - I found that hilarious but also isn't it a bad example given they are claiming accent control as a feature?
"Sanni Virtanen", I guess it was meant to be Finnish? Maybe the "bank customer support" part threw the AI off, lmao.
Changing my title to "Astronaut" right now... I'll be using that line as well anytime someone asks me to do something.
Oh wow. Thats definitely something…
Thanks for sharing this! I'm going to put this on my list to play around with. I'm not really an expert in this tech, I come from the audio background, but recently was playing around with streaming Speech-to-Text (using Whisper) / Text-to-Speech (using Kokoro at the time) on a local machine.
The most challenging part in my build was tuning the inference batch sizing here. I was able to get it working well for Speech-to-Text down to batch sizes of 200ms. I even implement a basic local agreement algorithm and it was still very fast (inferencing time, I think, was around 10-20ms?). You're basically limited by the minimum batch size, NOT inference time. Maybe that's a missing "secret sauce" suggested in the original post?
In the use case listed above, the TTS probably isn't a bottleneck as long as OP can generate tokens quickly.
All this being said a wrapped model like this that is able to handle hand-offs between these parts of the process sounds really useful and I'll definitely be interested in seeing how it performs.
Let me know if you guys play with this and find success.
oh - very interesting indeed! thanks