Comment by golol
5 months ago
What are you basing this one? The one thing that is very clearly stated up front is that this innovation is based on reinforcement learning. You dok't even have a good idea what the CoT looks like because those little summary snippets that the ChatGPT UI gives you are nothing substantial.
People repairing chatgpt replies with additional prompts is reinforcement learning training data.
"Reinforcement learning", just like any term used by AI researchers, is an extremely flexible, pseudo-psychological reskin of some pretty trivial stuff.