Comment by dakolli 4 days ago try here, I hate llms but this is crazy fast. https://chatjimmy.ai/ 6 comments dakolli Reply bmacho 4 days ago "447 / 6144 tokens" "Generated in 0.026s • 15,718 tok/s" This is crazy fast. I always predicted this speed in ~2 years in the future, but it's here, now. Lalabadie 4 days ago The full answer pops in milliseconds, it's impressive and feels like a completely different technology just by foregoing the need to stream the output. FergusArgyll 4 days ago Because most models today generate slowish, they give the impression of someone typing on the other end. This is just <enter> -> wall of text. Wild machiaweliczny 4 days ago We need that for this chinese 3B model that think 45s for hello world but also solves math. Bolwin 3 days ago Nanbeige. Yeah this seems ideal for models that scale test time compute Serenacula 3 days ago Do we know anything about the method?
bmacho 4 days ago "447 / 6144 tokens" "Generated in 0.026s • 15,718 tok/s" This is crazy fast. I always predicted this speed in ~2 years in the future, but it's here, now.
Lalabadie 4 days ago The full answer pops in milliseconds, it's impressive and feels like a completely different technology just by foregoing the need to stream the output.
FergusArgyll 4 days ago Because most models today generate slowish, they give the impression of someone typing on the other end. This is just <enter> -> wall of text. Wild
machiaweliczny 4 days ago We need that for this chinese 3B model that think 45s for hello world but also solves math. Bolwin 3 days ago Nanbeige. Yeah this seems ideal for models that scale test time compute
This is crazy fast. I always predicted this speed in ~2 years in the future, but it's here, now.
The full answer pops in milliseconds, it's impressive and feels like a completely different technology just by foregoing the need to stream the output.
Because most models today generate slowish, they give the impression of someone typing on the other end. This is just <enter> -> wall of text. Wild
We need that for this chinese 3B model that think 45s for hello world but also solves math.
Nanbeige. Yeah this seems ideal for models that scale test time compute
Do we know anything about the method?