Comment by exabrial
2 days ago
I feel like we need an entirely new type of silicon for LLMs. Something completely focused on bandwidth and storage probably at the sacrifice of raw computation power.
2 days ago
I feel like we need an entirely new type of silicon for LLMs. Something completely focused on bandwidth and storage probably at the sacrifice of raw computation power.
Something like this? (Llama 3.1-8B etched into custom silicon delivering 16,000 tok/s, doesn't use much PCIe bandwidth):
- https://taalas.com/the-path-to-ubiquitous-ai/ - https://chatjimmy.ai/
Wowsa that’s amazing! Exactly what I was imagining. To do that with 2500 watts is incredible.