Comment by MagicMoonlight

20 hours ago

There was a startup posted here which built custom hardware that let the AI respond instantly. Thousands of tokens per second.

8 comments

MagicMoonlight

tln 19 hours ago

Taalas. A sibling comment of yours posted the chat demo URL -

https://chatjimmy.ai/

2ndorderthought 18 hours ago
Woah. How is this working? It's stupid fast.
- mike_hearn 5 hours ago
  
  The weights are mapped directly to transistors. It's not a generic processor, it's literally a dedicated Llama 8B chip that can't be used for anything else. When you specialize in hardware you get faster - Taalas is pushing that to the limit.
  They seem to be doing well. I checked recently and their API is closed to signups due to overwhelming demand.

Grosvenor 19 hours ago

cerebras

They built an entire wafer ASIC. The entire thing is one huge active ASIC. it takes a lot of cool engineering and cooling to make it work, and is very cool.

zargon 20 hours ago

Groq.

beavisringdin 19 hours ago
No, it was a custom ASIC chip with weights baked in for a singular model. I do envision a future where we return to cartridges. Local AI is de facto and massively optimised chips are built to be plug and play running a single SoTA model.
- SJMG 19 hours ago
  
  Likely https://taalas.com