Comment by yujonglee

6 months ago

Happy to answer any questions!

These are list of local models it supports:

- whisper-cpp-base-q8

- whisper-cpp-base-q8-en

- whisper-cpp-tiny-q8

- whisper-cpp-tiny-q8-en

- whisper-cpp-small-q8

- whisper-cpp-small-q8-en

- whisper-cpp-large-turbo-q8

- moonshine-onnx-tiny

- moonshine-onnx-tiny-q4

- moonshine-onnx-tiny-q8

- moonshine-onnx-base

- moonshine-onnx-base-q4

- moonshine-onnx-base-q8

13 comments

yujonglee

phkahler 6 months ago

I thought whisper and others took large chunks (20-30 seconds) of speech, or a complete wave file as input. How do you get real-time transcription? What size chunks do you feed it?

To me, STT should take a continuous audio stream and output a continuous text stream.

yujonglee 6 months ago
I use VAD to chunk audio.
Whisper and Moonshine both works in a chunk, but for moonshine:
> Moonshine's compute requirements scale with the length of input audio. This means that shorter input audio is processed faster, unlike existing Whisper models that process everything as 30-second chunks. To give you an idea of the benefits: Moonshine processes 10-second audio segments 5x faster than Whisper while maintaining the same (or better!) WER.
Also for kyutai, we can input continuous audio in and get continuous text out.
- https://github.com/moonshine-ai/moonshine - https://docs.hyprnote.com/owhisper/configuration/providers/k...
- zveyaeyv3sfye 6 months ago
  
  Having used whisper and noticed the useless quality due to their 30-second chunks, I would stay far away from software working on even a shorter duration.
  The short duration effectively means that the transcription will start producing nonsense as soon as a sentence is cut up in the middle.
- mijoharas 6 months ago
  
  Something like that, in a cli tool, that just gives text to stdout would be perfect for a lot of use cases for me!
  (maybe with an `owhisper serve` somewhere else to start the model running or whatever.)
  
  5 replies →

shekhar101 6 months ago

FYI: owhisper pull whisper-cpp-large-turbo-q8 Failed to download model.ggml: Other error: Server does not support range requests. Got status: 200 OK

But the base-q8 works (and works quite well!). The TUI is really nice. Speaker diarization would make it almost perfect for me. Thanks for building this.

yujonglee 6 months ago

we store data in R2 and range query sometime glitch... It might work if you retry it

alkh 6 months ago

Sorry, maybe I missed it but I didn't see this list on your website. I think it is a good idea to add this info there. Besides that, thank you for the effort and your work! I will definetely give it a try

yujonglee 6 months ago

got it. fyi if you run `owhisper pull --help`, this info is printed