David Patterson: Challenges and Research Directions for LLM Inference Hardware

4 hours ago (arxiv.org)

David Patterson is such a legend! From RAID to RISC and one of the best books in computer architecture, he's on my personal hall of fame.

Several years ago I was at one of the Berkley AMP Lab retreats at Asilomar, and as I was hanging out, I couldn't figure how I know this person in front of me, until an hour later when I saw his name during a panel :)).

It was always the network. And David Patterson, after RISK, started wo iRAM, that was tackling the same problem.

NVIDIA bought Mellanox/Infiniband, but Google has historically excelled at networking, and the TPU seems to be designed to scale out in the best possible way.

> To address these challenges, we highlight four architecture research opportunities: High Bandwidth Flash for 10X memory capacity with HBM-like bandwidth; Processing-Near-Memory and 3D memory-logic stacking for high memory bandwidth; and low-latency interconnect to speedup communication.

High Bandwidth Flash (HBF) got submitted 6 hours ago! It's a great article, fantastic coverage of a wide section of the rapidly moving industry. https://blocksandfiles.com/2026/01/19/a-window-into-hbf-prog...

HBF is about having many dozens or hundreds of channels of flash memory. The idea of having Processing Near HBF, spread out, perhaps in mixed 3d design, would be not at all surprising to me. One of the main challenges for HBF is building improved vias, improved stacking, and if that tech advanced the idea of more mixed NAND and compute layers rather than just NAND stacks perhaps opens up too.

This is all really exciting possible next steps.