Comment by Tuna-Fish

11 hours ago

We are very, very far from thermodynamic limits. Lots of people have done the math, and current-gen systems use ~1000000000x more power than the Landauer limit, and ~100000x more power than ideal digital implementation on existing CMOS.

Currently, most AI systems work so that there is a large pool of memory on one side, compute on other side, and a very fat pipe between them. 90%+ of all energy goes into moving data from one side to the other, and selecting the specific element you wish to use from the large pool of ram. The energy cost of holding that data in memory and reading it from the memory cells, and the energy cost of doing the actual computation with low-precision FP are both trivial in comparison.

The systems are built this way because this is the most flexible architecture, and can be used for many different kinds of workloads. But the workload of a transformer in no way requires this flexibility. All the data is fairly local to the execution units that consume it. If you design a system as full PIM, where each ALU is associated and located with the small storage pool that contains only the elements used by that alu, and then tile that out to implement the full model, you cut out most of the energy cost of moving data. The cost is you need much more silicon to implement a working system, but the benefit is not just improved energy-efficiency, but also token speed and silicon efficiency.

The industry is moving towards such designs, with many startups working towards it with different approaches, Nvidia's recent aquisition* of Groq, etc. There is a well-understood path towards ~1000x higher token speeds at ~1000x better energy efficiency, that requires no new innovations, just investment of money into specialization.

There are even more gains if you move the weights into ROM, but that would require you to specialize not just for a specific type of model, but also for a specific set of model weights, ala Taalas.

I find the AI discourse is diseased because on one side you get people breathlessly overestimating the current state of the industry and progress that's going to happen in the next ~2 years, and on the other side people assume that the technology as is is what it will always be and completely ignore that the industry is aware of and actively working towards many ways to improve hardware, it's just that complex leading edge silicon chips take years to take from idea to working products, and transformer inference was only very recently proven to be a market large enough to specialize for.

0 comments

Tuna-Fish

No comments yet

Contribute on Hacker News ↗