Comment by pjs_

6 days ago

Continue to believe that Cerebras is one of the most underrated companies of our time. It's a dinner-plate sized chip. It actually works. It's actually much faster than anything else for real workloads. Amazing

78 comments

pjs_

onlyrealcuzzo 6 days ago

Nvidia seems cooked.

Google is crushing them on inference. By TPUv9, they could be 4x more energy efficient and cheaper overall (even if Nvidia cuts their margins from 75% to 40%).

Cerebras will be substantially better for agentic workflows in terms of speed.

And if you don't care as much about speed and only cost and energy, Google will still crush Nvidia.

And Nvidia won't be cheaper for training new models either. The vast majority of chips will be used for inference by 2028 instead of training anyway.

Nvidia has no manufacturing reliability story. Anyone can buy TSMC's output.

Power is the bottleneck in the US (and everywhere besides China). By TPUv9 - Google is projected to be 4x more energy efficient. It's a no-brainer who you're going with starting with TPUv8 when Google lets you run on-prem.

These are GW scale data centers. You can't just build 4 large-scale nuclear power plants in a year in the US (or anywhere, even China). You can't just build 4 GW solar farms in a year in the US to power your less efficient data center. Maybe you could in China (if the economics were on your side, but they aren't). You sure as hell can't do it anywhere else (maybe India).

What am I missing? I don't understand how Nvidia could've been so far ahead and just let every part of the market slip away.

sailingparrot 6 days ago
> let every part of the market slip away.
Which part of the market has slept away, exactly ? Everything you wrote is supposition and extrapolation. Nvidia has a chokehold on the entire market. All other players still exist in the small pockets that Nvidia doesn’t have enough production capacity to serve. And their dev ecosystem is still so far ahead of anyone else. Which providers gets chosen to equip a 100k chips data center goes so far beyond the raw chip power.
- mgambati 5 days ago
  
  If code is getting cheaper, making cuda alternatives and tooling should not be very far. I can’t see nvidia holding the position for much longer.
- onlyrealcuzzo 6 days ago
  
  > Nvidia has a chokehold on the entire market.
  You're obviously not looking at expected forward orders for 2026 and 2027.
  
  1 reply →
wing-_-nuts 6 days ago
Man I hope someone drinks Nvidia's milk shake. They need to get humbled back to the point where they're desperate to sell gpus to consumers again.
Only major road block is cuda...
- jmalicki 5 days ago
  
  The nice thing about modern LLMs is that it's a relatively large static use case. The compute is large and expensive enough you can afford to just write custom kernels, to a degree. It's not like CUDA where running on 1, 2, 8 GPUs and you need libraries that already do it all for you, and where researchers are building lots of different models.
  There aren't all that many different small components between all of the different transformer based LLMs out there.
  
  1 reply →
mnicky 6 days ago
> What am I missing?
Largest production capacity maybe?
Also, market demand will be so high that every player's chips will be sold out.
- onlyrealcuzzo 6 days ago
  
  > Largest production capacity maybe?
  Anyone can buy TSMC's output...
  
  5 replies →
DeathArrow 5 days ago

What puzzles me is that AMD can't secure any meaningful size of AI market. They missed this train badly.
icelancer 5 days ago

> What am I missing?
VRAM capacity given the Cerebras/Groq architecture compared to Nvidia.
In parallel, RAM contracts that Nvidia has negotiated well into the future that other manufacturers have been unable to secure.
Handy-Man 6 days ago

Well they `acquired` groq for a reason.
whism 6 days ago

I believe they licensed smth from groq

zozbot234 6 days ago

It's "dinner-plate sized" because it's just a full silicon wafer. It's nice to see that wafer-scale integration is now being used for real work but it's been researched for decades.

h14h 5 days ago

I'm fascinated by how the economy is catching up to demand for inference. The vast majority of today's capacity comes from silicon that merely happens to be good at inference, and it's clear that there's a lot of room for innovation when you design silicon for inference from the ground up.

With CapEx going crazy, I wonder where costs will stabilize and what OpEx will look like once these initial investments are paid back (or go bust). The common consensus seems to be that there will be a rug pull and frontier model inference costs will spike, but I'm not entirely convinced.

I suspect it largely comes down to how much more efficient custom silicon is compared to GPUs, as well as how accurately the supply chain is able to predict future demand relative to future efficiency gains. To me, it is not at all obvious what will happen. I don't see any reason why a rug pull is any more or less likely than today's supply chain over-estimating tomorrow's capacity needs, and creating a hardware (and maybe energy) surplus in 5-10 years.

tiffanyh 5 days ago

If history has taught us anything, “engineered systems” (like mainframes & hyper converged infrastructure) emerge at the start of a new computing paradigm … but long-term, commodity compute wins the game.

alecco 5 days ago

Chips and RAM grew in capacity but latency is mostly flat and interconnect power consumption grew a lot. So I think the paradigm changed. Even with newer ones like NVlink.
For 28 years Intel Xeon chips come with massive L2/L3. Nvidia is making bigger chips with last being 2 big chips interconnected. Cerebras saw the pattern and took it to the next level.
And the technology is moving 3D towards stacking layers on the wafer so there is room to grow that way, too.
pjs_ 5 days ago

I think that was true when you could rely on good old Moore’s law to make the heavy iron quickly obsolete but I also think those days are coming to an end

latchkey 6 days ago

Not for what they are using it for. It is $1m+/chip and they can fit 1 of them in a rack. Rack space in DC's is a premium asset. The density isn't there. AI models need tons of memory (this product annoucement is case in point) and they don't have it, nor do they have a way to get it since they are last in line at the fabs.

Their only chance is an aquihire, but nvidia just spent $20b on groq instead. Dead man walking.

spwa4 6 days ago
Oh don't worry. Ever since the power issue started developing rack space is no longer at a premium. Or at least, it's no longer the limiting factor. Power is.
- latchkey 6 days ago
  
  The dirty secret is that there is plenty of power. But, it isn't all in one place and it is often stranded in DC's that can't do the density needed for AI compute.
  Training models needs everything in one DC, inference doesn't.
  
  1 reply →
p1esk 6 days ago
The real question is what’s their perf/dollar vs nvidia?
- zozbot234 6 days ago
  
  I guess it depends what you mean by "perf". If you optimize everything for the absolutely lowest latency given your power budget, your throughput is going to suck - and vice versa. Throughput is ultimately what matters when everything about AI is so clearly power-constrained, latency is a distraction. So TPU-like custom chips are likely the better choice.
  
  8 replies →
- energy123 5 days ago
  
  That's coupling two different usecases.
  Many coding usecases care about tokens/second, not tokens/dollar.
- latchkey 6 days ago
  
  Exactly. They won't ever tell you. It is never published.
  Let's not forget that the CEO is an SEC felon who got caught trying to pull a fast one.
- xnx 6 days ago
  
  Or Google TPUs.
  
  2 replies →
boredatoms 5 days ago
Power/cooling is the premium.
Can always build a bigger hall
- latchkey 5 days ago
  
  Exactly my point. Their architecture requires someone to invest the capex / opex to also build another hall.
arisAlexis 5 days ago
How do you know the price of a unit ?
- latchkey 5 days ago
  
  I remembered $1m from when I was in their booth at SC24, but when I just looked, I was wrong. It is worse...
  https://www.datacenterdynamics.com/en/news/cerebras-unveils-...
  
  2 replies →

arcanemachiner 6 days ago

Just wish they weren't so insanely expensive...

azinman2 6 days ago
The bigger the chip, the worse the yield.
- thunderbird120 5 days ago
  
  Cerebras has effectively 100% yield on these chips. They have an internal structure made by just repeating the same small modular units over and over again. This means they can just fuse off the broken bits without affecting overall function. It's not like it is with a CPU.
  
  1 reply →
- speedgoose 6 days ago
  
  I suggest to read their website, they explain pretty well how they manage good yield. Though I’m not an expert in this field. I does make sense and I would be surprised if they were caught lying.
- moralestapia 6 days ago
  
  This comment doesn't make sense.
  
  15 replies →

mzl 5 days ago

Technically, Cerebras solution is really cool. However, I am skeptical that it will be economically useful for models that are larger in size, as the requirements on the number of racks scales with the the size of the model to fit the weights in SRAM.

dalemhurley 6 days ago

Yet investors keep backing NVIDIA.

vimda 6 days ago

At this point Tech investment and analysis is so divorced from any kind of reality that it's more akin to lemmings on the cliff than careful analysis of fundamentals

femiagbabiaka 6 days ago

yep

xnx 6 days ago

Cerebras is a bit of a stunt like "datacenters in spaaaaace".

Terrible yield: one defect can ruin a whole wafer instead of just a chip region. Poor perf./cost (see above). Difficult to program. Little space for RAM.

the_duke 6 days ago

They claim the opposite, though, saying the chip is designed to tolerate many defects and work around them.