Comment by leetrout

2 months ago

Related check out chain of draft if you haven't.

Similar performance with 7% of tokens as chain of thought.

2 comments

leetrout

That's a comparison to "CoT via prompting of chat models", not "CoT via training reasoning models with RLVR", so it may not apply.

This seems remarkably less safe?

Would would we want to purposely decrease interpretability?

Very strange.