Comment by tucnak
1 day ago
Since n300's came out, they have publicly shared their roadmap, so I've been waiting for next-generation hardware ever since. They have also announced p300 yesterday (that would put two Blackhole chips on one card, akin to what n300 did before)
So that would put a p300 unit at 64 GB GDDR6 and 1 TB/s bandwidth. Very competitive, considering that Tenstorrent is now the only vendor to offer scale-out at reasonable price point. Whoever figures out how to make it work with Corundum[1] for unlimited K/V cache offloading is going to make a lot of money: as agents spend more time executing tool-code, and de-coupled from chats, the individual jobs will take more and more time, so scheduling will become more important. How do you manage TB's of K/V cache concurrently?
People complaining about bandwidth are not seeing the bigger picture. Probably because they're unaware NVMe-oF exists, and never kept up with modern network topologies, because hyperscaler Kool-Aid doesn't include it.
No comments yet
Contribute on Hacker News ↗