Comment by lostmsu

7 days ago

I built a speech-to-text API at $0.06/h. Currently making about $5k MRR. The pricing model is flat rate + throttled API for experimentation. The first paying customers came from Reddit comments on relevant topics. Speech transcription at scale is expensive with majority of the current cloud providers.

Biggest challenge was getting the first few customers (is there anyone for who this was not the case?).

https://borgcloud.org/speech-to-text

Congratulations!

As the other commenter mentioned, this seems like an extremely competitive space. But I did a quick Google for speech-to-text and your solution is a few times cheaper than most others.

I know you may not want to share your secret sauce, but I'm curious as to how you managed that. I'm guessing you use Whisper, is the trick to just run it on your own hardware instead of paying AWS prices for compute?

  • I think people who don't have tech skills not aware of cheaper solutions so still can make profit but yeah many solutions that are cheaper: e.g. groq provide whisper v3 large turbo for $0.04 per hour: https://groq.com/pricing . Gemini 2.0 flash is around $0.08 per hour

    • Oh, did not know Groq offers ASR at that rate. Will be tough to compete with.

      Will probably have to at least match their price in STT API and take a hit to revenue. But we are working on an end-to-end voice offering, and Groq does not seem to have anything in that area yet, so hopefully will stay afloat.

Cool and congratulations! This feels like a very competitive space, especially with whisper etc being so good.

Btw, looks like you made a typo in your first paragraf writing text-to-speech instead of speech-to-text.