Comment by serjester

10 months ago

With this being a fraud, does anyone have opinions on the <thought> approach they took? It seems like an interesting idea to let the model spread its reasoning across more tokens.

At the same time it also seems like it’d already be baked into the model through RLHF? Basically just a different COT flow?