Comment by mchusma

21 hours ago

I find it puzzling Google doesn’t actively promote its own cloud for inference of Gemma 4. Open source is great, love it. But shouldn’t Google want me to be able to use and pay for it through Gemini and vertex?

16 comments

mchusma

WarmWash 20 hours ago

A key thing to understand about Google is that under the hood is a collection of extremely powerful fiefdoms (many of which would stand as their own fortune 500, hell 100) that are all trying to act in their own interest. It's almost closer to a conglomerate than a company, where Google needs to bid internally against external players for resources.

If Gemma 4 is less lucrative than Claude to the Google Cloud kingdom, the Cloud kingdom will want you using Claude.

anthonypasq 19 hours ago

interesting. presumably this is why google is selling TPUs externally instead of hoarding them for deepmind.

Havoc 21 hours ago

There is a decent yt here going through what google's logic with gemma overall might be

https://www.youtube.com/watch?v=sXgZhGzqPmU

As for why cloud offer it - think it's just an effort to promote the brand. The gemmas are pretty small so they can host it without it being a major drain on the company. They have the infra anyway

Farmadupe 21 hours ago

I wonder if for a model that small with a permissive license it might not be worth their time to host a commercial grade inference stack?

Might be easier to chuck it over the fence and let other providers handle it as it'll run in almost any commercial grade card?

Also speculating, but I wonder if it might also create a bit of a pricing problem relative to Gemini flashlight depending on serving cost and quality of outputs?

As a comparison, despite being SotA for their size, the smallest qwen models on openrouter (27b and 35b) are not at all worth using, as there are way bigger and better models for less oricemon a per token basis

mchusma 15 hours ago

If you were to believe a lot of metrics Gemma 31B it’s much better than flash lite. It seems like I should be able to pay Google to use it and that should be at least a secretary, called action how I can do that but it’s missing from both the blog post entirely.
disiplus 21 hours ago
i dont know what are you talking about, i replaced an older gpt4o with a finetuned qwen. there is a huge amount of "AI, that can be done with those models, or partly by those models." Huge amount of people would not notice the difference. And if you prepare the context correctly, even bigger slice of people would not notice.
- Farmadupe 20 hours ago
  
  If it helps, I mean it in a really literal sense. qwen3.6 27b is currently $3.20 per million tokens on openrouter right now which is way overpriced. As good as the 27b is, kimi k2.5 $3.00 and it's just in another league in terms of capability. There's no reason to spend money on it.
  And even alibaba's own qwen3.6-plus is $1.95, so it's kinda easy to come to a conclusion that alibaba (nor anyone else) is really interested in hosting that model.
  And don't get me wrong, I fully agree with you, qwen3.6 27b is an amazing model. I run it on my own hardware and every day I'm constantly surprised with what it can zero shot.
- dakolli 21 hours ago
  
  Genuinely curious, what are you "fine tuning" these smaller models to do reliably? I hear this talked about a lot but very few people actually cough up examples, and I'd love to actually hear of one.
  
  1 reply →

whoahwio 19 hours ago

Makes me wonder about the partnership with apple to use gemini. safe to assume apple has a preference for on-device, and the best open model (for consumer hardware at least) is a google property with an apache 2 license. Interesting dynamic and seemingly a bright spot in the market

fomoz 11 hours ago

You can use it for free with Google AI studio (free tier or paid tier accounts with different limits). Or use the paid version from Vertex AI which is around 3x cheaper than Gemini 3 Flash.

I'm using Gemma 4 31B in my app with 5 agents, 1.5k requests per day, each.

djyde 8 hours ago

I'm curious what tasks you use this model for?

nolist_policy 20 hours ago

What do you mean? It just works with Google AI Studio.

mchusma 15 hours ago

Part of the issue is Google complex web of products. There’s vertex Gemini Google AI studio Google edge. But I literally had trouble finding how to use this in my existing paid Gemini API account.

seamossfet 21 hours ago

[dead]