← Back to context

Comment by dust42

5 days ago

> Don’t forget that the 8B model requires 10 of said chips to run.

Are you sure about that? If true it would definitely make it look a lot less interesting.

Their 2.4 kW is for 10 chips it seems based on the next platform article.

I assume they need all 10 chips for their 8B q3 model. Otherwise, they would have said so or they would have put a more impressive model as the demo.

https://www.nextplatform.com/2026/02/19/taalas-etches-ai-mod...

  • It doesn’t make any sense to think you need the whole server to run one model. It’s much more likely that each server runs 10 instances of the model

    1. It doesn’t make sense in terms of architecture. It’s one chip. You can’t split one model over 10 identical hardwire chips

    2. It doesn’t add up with their claims of better power efficiency. 2.4kW for one model would be really bad.

    • We are both wrong.

      First, it is likely one chip for llama 8B q3 with 1k context size. This could fit into around 3GB of SRAM which is about the theoretical maximum for TSMC N6 reticle limit.

      Second, their plan is to etch larger models across multiple connected chips. It’s physically impossible to run bigger models otherwise since 3GB SRAM is about the max you can have on an 850mm2 chip.

        followed by a frontier-class large language model running inference across a collection of HC cards by year-end under its HC2 architecture
      

      https://mlq.ai/news/taalas-secures-169m-funding-to-develop-a...

      1 reply →

    • Thanks for having a brain.

      Not sure who started that "split into 10 chips" claim, it's just dumb.

      This is Llama 3B hardcoded (literally) on one chip. That's what the startup is about, they emphasize this multiple times.

      1 reply →