← Back to context

Comment by keyle

6 days ago

I don't mind so much the size in MB, the fact that it's pure CPU and the quality, what I do mind however is the latency. I hope it's fast.

Aside: Are there any models for understanding voice to text, fully offline, without training?

I will be very impressed when we will be able to have a conversation with an AI at a natural rate and not "probe, space, response"

Voice to text fully offline can be done with whisper. A few apps offer it for dictation or transcription.

"The brown fox jumps over the lazy dog.."

Average duration per generation: 1.28 seconds

Characters processed per second: 30.35

--

"Um"

Average duration per generation: 0.22 seconds

Characters processed per second: 9.23

--

"The brown fox jumps over the lazy dog.. The brown fox jumps over the lazy dog.."

Average duration per generation: 2.25 seconds

Characters processed per second: 35.04

--

processor : 0

vendor_id : AuthenticAMD

cpu family : 25

model : 80

model name : AMD Ryzen 7 5800H with Radeon Graphics

stepping : 0

microcode : 0xa50000c

cpu MHz : 1397.397

cache size : 512 KB

  • Hmm that actually seems extremely slow, Piper can crank out a sentence almost instantly on a Pi 4 which is a like a sloth compared to that Ryzen and the speech quality seems about the same at first glance.

    I suppose it would make sense if you want to include it on top of an LLM that's already occupying most of a GPU and this could run in the limited VRAM that's left.

  • assuming most answers will be more than a sentence, 2.25 seconds is already long enough if you factor the token generation in between... and imagine with reasoning!... We're not there yet.