Comment by MagicMoonlight
20 hours ago
There was a startup posted here which built custom hardware that let the AI respond instantly. Thousands of tokens per second.
20 hours ago
There was a startup posted here which built custom hardware that let the AI respond instantly. Thousands of tokens per second.
Taalas. A sibling comment of yours posted the chat demo URL -
https://chatjimmy.ai/
Woah. How is this working? It's stupid fast.
The weights are mapped directly to transistors. It's not a generic processor, it's literally a dedicated Llama 8B chip that can't be used for anything else. When you specialize in hardware you get faster - Taalas is pushing that to the limit.
They seem to be doing well. I checked recently and their API is closed to signups due to overwhelming demand.
cerebras
They built an entire wafer ASIC. The entire thing is one huge active ASIC. it takes a lot of cool engineering and cooling to make it work, and is very cool.
Groq.
No, it was a custom ASIC chip with weights baked in for a singular model. I do envision a future where we return to cartridges. Local AI is de facto and massively optimised chips are built to be plug and play running a single SoTA model.
Likely https://taalas.com