Comment by Ninjinka
18 days ago
Pricing is CRAZY.
Audio input is $0.70 per million tokens on 2.0 Flash, $0.075 for 2.0 Flash-Lite and 1.5 Flash.
For gpt-4o-mini-audio-preview, it's $10 per million tokens of audio input.
18 days ago
Pricing is CRAZY.
Audio input is $0.70 per million tokens on 2.0 Flash, $0.075 for 2.0 Flash-Lite and 1.5 Flash.
For gpt-4o-mini-audio-preview, it's $10 per million tokens of audio input.
The increase is likely because 1.5 Flash was actually cheaper than all other STT services. I wrote about this a while ago at https://ktibow.github.io/blog/geminiaudio/.
I feel that the audio interpreting aspects of the Gemini models aren't just STT. If you give it something like a song, it can give you information about it.
Sadly: "Gemini can only infer responses to English-language speech."
https://ai.google.dev/gemini-api/docs/audio?lang=rest#techni...
I don't know what they mean by this but the obvious interpretation is not true. It understands other languages, it even does really well with low representation languages, in my case Latvian.