Comment by sipjca

7 months ago

ive approached the same thing but slightly differently. i can run it on consumer hardware for vastly cheaper than the cloud and don't have to worry about image sizes at all. (bare metal is 'faster') offering 20,000 minutes of transcription for free up to the rate limit (1 Request Every 5 Seconds)

https://geppetto.app

I contributed "whisperfile" as a result of this work:

* https://github.com/Mozilla-Ocho/llamafile/tree/main/whisper....

* https://github.com/cjpais/whisperfile

if you ever want to chat about making transcription virtually free or so cheap for everyone let me know. I've been working on various projects related to it for a while. including open source/cross-platform superwhisper alternative https://handy.computer

2 comments

sipjca

ilyakaminsky 7 months ago

> i can run it on consumer hardware for vastly cheaper than the cloud

Woah, that's really cool, CJ! I've been toying the with idea of standing up a cluster of older iPhones to run Apple's Speech framework. [1] The inspiration came from this blog post [2] where the author is using it for OCR. A couple of things are holding me back: (1) the OSS models are better according to the current benchmarks and (2) I have customers all over the world, so that geographical load-balancing is a real factor. With that said, I'll definitely spend some time checking out your work. Thanks for sharing!

[1] https://developer.apple.com/documentation/speech

[2] https://terminalbytes.com/iphone-8-solar-powered-vision-ocr-...

sipjca 7 months ago

ty! if there's any way I can help just lmk, always happy to lend a hand or an ear