Comment by mirekrusin

20 hours ago

There were bugs at the beginning (imho worst ones where it kind of works but sucks), you should re-try with latest llama.cpp/quants/whatever you're using.

Stuff like repeated nonsense, endless ???????? output, bogus code, loops after a few hundred tokens, working fine for the first few hundred tokens, then getting stuck in a loop, gibberish output (with flash attention) on after second or third prompt, flash attention failing with kv-cache quantization on long prompts, chat template / jinja / tool-calling problems, inconsistent tool calls in agentic coding, mixed-language nonsense and repeated fragments (corrupted llama-server state / grammar-trigger loop), partial cpu offload/fit problems (it would exit reasoning, start coding, interrupt functions after a few lines, then rewrite snippets repeatedly) etc were all unintended and were fixed.

0 comments

mirekrusin

No comments yet

Contribute on Hacker News ↗