Comment by dakolli

4 days ago

try here, I hate llms but this is crazy fast. https://chatjimmy.ai/

6 comments

dakolli

  "447 / 6144 tokens"
  "Generated in 0.026s • 15,718 tok/s"

This is crazy fast. I always predicted this speed in ~2 years in the future, but it's here, now.

Lalabadie 4 days ago

The full answer pops in milliseconds, it's impressive and feels like a completely different technology just by foregoing the need to stream the output.

FergusArgyll 4 days ago

Because most models today generate slowish, they give the impression of someone typing on the other end. This is just <enter> -> wall of text. Wild

machiaweliczny 4 days ago

We need that for this chinese 3B model that think 45s for hello world but also solves math.

Bolwin 3 days ago

Nanbeige. Yeah this seems ideal for models that scale test time compute

Serenacula 3 days ago

Do we know anything about the method?