Comment by spwa4

23 days ago

I've been trying to do this, but I can't get voice recognition to work fast enough (meaning live) with Gemma E2B, on either an M1 max (64GB), a 5060 Ti (16Gb) or a SnapDragon 8 Gen2.

Any pointers?

What's your average response time with M1 max and what's the target?

  • I'm only at about 650msec and, well, ideally 100 would be great.

    • Well, on my demo it's around 2.5s and I already consider it as a "real-time". One way to improve it is to disable the image input.