Comment by tomassre
4 hours ago
There’s issues specific to workflows “agents”. For example many requests in an agent are all on top of the same previous results so context (kv cache) needs to be longer, and they use these massive connected nodes with direct nvme to cache the part of the prompt that’s repeatable.
It is about agents in that the design is for long context, many requests where the initial “chunk” is cached but spread across many requests.
They don’t call this out specifically but in the technical details like about the sram, how it’s all interconnected nodes in a pod it’s “designed” for it.
No comments yet
Contribute on Hacker News ↗