Comment by vorticalbox

13 hours ago

This is a problem I find with opus is will spend so long thinking then going “but wait what if”

To point where I stop it and simple tell it to “start writing code you can work it out as you go along”

Seems writers block also effects LLM

26 comments

vorticalbox

In this paper they nerf an LLMs ability to emit waffling thinking tokens like "wait", "but", "alternatively", and the models (they're old, small models in the paper) terminate reasoning faster and perform better. I bet Anthropic is tuning this on their backend.

orbital-decay 2 hours ago

I imagine Anthropic would rather train a small control model instead of resorting to sampling hacks
meatmanek 6 hours ago
This is super cool. Do you know if any of the inference backends (llama.cpp, vllm, etc) support this technique?
- iaw 2 hours ago
  
  vLLM supports "banning" certain tokens but I don't know if it can dynamically reduce them.
  To my knowledge you can also "ban" with llama.cpp but it is passed in the API call rather than to the server at initialization.

giancarlostoro 11 hours ago

I usually have Claude build a plan first, then I put it into an XML file it updates with phases, usually we talk about some of those tasks, and then once its good and I like it, I have Claude implement the plan.

Another thing I tell Claude to do is to not guess, but look at documentation, it messes up a lot less, might use some tokens reading docs, but at least it has a higher success rate code wise.

xstas1 11 hours ago
XML??
- giancarlostoro 11 hours ago
  
  Apparently because of how Claude is trained, even the system level prompts go through as XML, it works better with XML "prompting" so I figured I could have it write plans in XML. I need to update my ticketing tool to output XML maybe by default.
  https://www.reddit.com/r/ClaudeAI/comments/1psxuv7/anthropic...
  
  4 replies →
- aesthesia 3 hours ago
  
  One reason to use XML-like formatting is that it makes the beginning and end of sections explicit. This is less of an issue when the model is generating text but can still be helpful when using templated prompts.
- root-parent 10 hours ago
  
  XML stands for Xtra ML....
  
  1 reply →

mikeocool 12 hours ago

Seriously. Whenever I read the thinking output I get mad and turn down effort to medium or low.

Just output the code and we’ll work through it!

I feel similarly about having codex review claude’s plans. I don’t think I’ve ever seen it catch a major issue. It just points out things that would have inevitably been addressed during implementation anyway.

SubiculumCode 7 hours ago

A lot of times this is how humans work. Just start 'putting words on paper', 'think by doing', etc. sometimes it's more efficient to see why something won't work after writing a bit of it, and sometimes you get lucky and it works right off the bat

drob518 8 hours ago

Qwen is notorious for this, too. It’ll sometimes spin in a long loop of “But wait…” paragraphs.

epolanski 13 hours ago

Fable was 20 times worse on that.

It's clear it was the vibe coding model, as like no other model before, fully turned you into his assistant instead of the other way around.

RyanHamilton 12 hours ago
Could it be possible, these firms are optimizing for two things: a) Better performance. b) Gathering data from you to further improve performance later. I've also found the huge amount of planning rather than iteration frustrating. I've felt like I'm teaching a junior!
- epolanski 12 hours ago
  
  I think they simply optimize around E2E benchmarks, none of those benchmarks is designed as multi turn assistance to the user, but going from a prompt straight to the final solution.
  
  1 reply →
- happyPersonR 10 hours ago
  
  more thinking == more tokens === more money LOLL
  
  2 replies →

thinkingtoilet 12 hours ago

I've been having success with Opus but you REALLY have to tame it. Long prompts that list what files to look at, relationships between entities, etc... I went from regularly hitting my daily limit to almost never hitting it. Oh, and also I was being lazy with small changes and stopping that helped a lot too. As you said, it gets in these loops where it's just churning and if you don't stop it it can go on for way too long.