Comment by tucnak

1 day ago

Since n300's came out, they have publicly shared their roadmap, so I've been waiting for next-generation hardware ever since. They have also announced p300 yesterday (that would put two Blackhole chips on one card, akin to what n300 did before)

So that would put a p300 unit at 64 GB GDDR6 and 1 TB/s bandwidth. Very competitive, considering that Tenstorrent is now the only vendor to offer scale-out at reasonable price point. Whoever figures out how to make it work with Corundum[1] for unlimited K/V cache offloading is going to make a lot of money: as agents spend more time executing tool-code, and de-coupled from chats, the individual jobs will take more and more time, so scheduling will become more important. How do you manage TB's of K/V cache concurrently?

People complaining about bandwidth are not seeing the bigger picture. Probably because they're unaware NVMe-oF exists, and never kept up with modern network topologies, because hyperscaler Kool-Aid doesn't include it.

[1] https://github.com/corundum/corundum