Comment by cjbgkagh
5 hours ago
It seems like general improvements in ram efficiency, such as that used in Gemma 4, means it’s back to memory bandwidth as the bottleneck and less about total available memory size. I’m also curious to see how much more agent autonomy will reduce less need for low latency and shift the focus to more throughput. Meaning it’s easier to spread the model out over multiple smaller GPUs and use pipeline parallelism to keep them busy. This would also mean using ram for market discrimination becomes less effective.
No comments yet
Contribute on Hacker News ↗