Comment by mirekrusin
20 hours ago
There were bugs at the beginning (imho worst ones where it kind of works but sucks), you should re-try with latest llama.cpp/quants/whatever you're using.
Stuff like repeated nonsense, endless ???????? output, bogus code, loops after a few hundred tokens, working fine for the first few hundred tokens, then getting stuck in a loop, gibberish output (with flash attention) on after second or third prompt, flash attention failing with kv-cache quantization on long prompts, chat template / jinja / tool-calling problems, inconsistent tool calls in agentic coding, mixed-language nonsense and repeated fragments (corrupted llama-server state / grammar-trigger loop), partial cpu offload/fit problems (it would exit reasoning, start coding, interrupt functions after a few lines, then rewrite snippets repeatedly) etc were all unintended and were fixed.
No comments yet
Contribute on Hacker News ↗