Comment by KaiserPro

4 hours ago

Gemini-pro-preview is on ollama and requires h100 which is ~$15-30k. Google are charging $3 a million tokens. Supposedly its capable of generating between 1 and 12 million tokens an hour.

Which is profitable. but not by much.

3 comments

KaiserPro

grim_io 2 hours ago

What do you mean it's on ollama and requires h100? As a proprietary google model, it runs on their own hardware, not nvidia.

KaiserPro 1 hour ago
sorry A lack of context:
https://ollama.com/library/gemini-3-pro-preview
You can run it on your own infra. Anthropic and openAI are running off nvidia, so are meta(well supposedly they had custom silicon, I'm not sure if its capable of running big models) and mistral.
however if google really are running their own inference hardware, then that means the cost is different (developing silicon is not cheap...) as you say.
- zozbot234 1 hour ago
  
  That's a cloud-linked model. It's about using ollama as an API client (for ease of compatibility with other uses, including local), not running that model on local infra. Google does release open models (called Gemma) but they're not nearly as capable.