Comment by comova
2 months ago
I believe this is to improve performance by shortening the context window for long thinking processes. I don't think this is referring to real-time summarizing for the users' sake.
2 months ago
I believe this is to improve performance by shortening the context window for long thinking processes. I don't think this is referring to real-time summarizing for the users' sake.
When you do a chat are reasoning traces for prior model outputs in the LLM context?
No, they are normally stripped out.
> I don't think this is referring to real-time summarizing for the users' sake.
That's exactly what it's referring to.