Comment by golol

10 months ago

What are you basing this one? The one thing that is very clearly stated up front is that this innovation is based on reinforcement learning. You dok't even have a good idea what the CoT looks like because those little summary snippets that the ChatGPT UI gives you are nothing substantial.

1 comment

golol

mjburgess 10 months ago

People repairing chatgpt replies with additional prompts is reinforcement learning training data.

"Reinforcement learning", just like any term used by AI researchers, is an extremely flexible, pseudo-psychological reskin of some pretty trivial stuff.