Comment by barnas2

1 day ago

A company called Taalas is working on something like that. Not Opus4.6 quality, but I'm sure they're targeting larger models. Currently they're using a LLama 8B model. It runs at ~17k tokens per second, and you can test it at https://chatjimmy.ai/.

6 comments

barnas2

Brisk4t 13 hours ago

I'm rooting for them HARD but they've been quiet since their last (and only) blog. X and LinkedIn are empty too. I really hope it wasn't a pipe dream.

mirekrusin 21 hours ago

It starts to be interesting when latency is better than average website.

vineyardmike 15 hours ago

I’m not sure if this is what you meant, but at 17k t/s, you start to compete with the speed of network calls. You could approach the point of generating an HTML/js/css page faster than some websites can be returned over the network.
all2 19 hours ago

The immediate load (less than 200ms on my machine through a slow connection) is quite pleasant, tbh.

tomaytotomato 7 hours ago

That's cool, I just tested it out and it is fast but unfortunately its accuracy is not great.

selcuka 4 hours ago

It's an 8B model. Consider it a proof-of-concept.