Comment by anonzzzies

2 months ago

We need custom inference chips at scale for this imho. Every computer (whatever formfactor/board) should have an inference unit on it so at least inference is efficient and fast and can be offloaded while the cpu is doing something else.

8 comments

anonzzzies

Aurornis 2 months ago

The bottleneck in common PC hardware is mostly memory bandwidth. Offloading the computation part to a different chip wouldn’t help if memory access is the bottleneck.

There have been a lot of boards and chips for years with dedicated compute hardware, but they’re only so useful for these LLM models that require huge memory bandwidth.

touristtam 2 months ago
It is also to note that the bandwidth bus has seen very little upgrade over the years and even the onboard RAM on GPU card have seen mediocre upgrades. If everyone and their grandma wasn't using NVidia GPUs we would probably have seen a more competitive market and greater changes outside the chip itself.
- bigyabai 2 months ago
  
  I don't think that's true. AMD, Apple and Intel are all dGPU competitors with roughly the same struggle bringing upgrades to market. They have every incentive to release a disruptive product, but refuse to invest in their ecosystem the way Nvidia did.

chvid 2 months ago

Look at the specs of this Orange Pi 6+ board - dedicated 30 TPU NPU.

https://boilingsteam.com/orange-pi-6-plus-review/

sofixa 2 months ago

Almost all of them have it already. Microsoft's "Copilot+" branding includes a prerequisite for an NPU with a minimal amount of TOPS.

It's just that practically nothing uses those NPUs.

baq 2 months ago

At this point of the timeline compute is cheap, it’s RAM which is basically unavailable.

fouc 2 months ago

I can't believe this was downvoted. It makes a lot of sense that it would be highly useful to have mass custom inference chips.

bigyabai 2 months ago

It's quite easy to understand. The tech industry has gone through 4-5 generations of obsolete NPU hardware that was dead-on-arrival. Meanwhile, there are still GPUs from 2014-2016 that run CUDA and are more power efficient than the NPUs.
The industry has to copy CUDA, or give up and focus on raster. ASIC solutions are a snipe chase, not to mention small and slow.