Comment by irthomasthomas

2 days ago

I won't use or recommend models with hidden reasoning, (thats all American models). It's too much of a risk and makes prompt optimization harder. Risky because it makes it possible for an attacker to prompt inject the reasoning chain to carry out a secret objective, and to hide that from the summary and output.

Interleaved reasoning and function calling makes this even more dangerous. A model can call functions during the hidden reasoning phase. An attacker could then exfiltrate data from you while the reasoning summary hides it from the user.

It also makes it impossible to know if the model is doomplooping during reasoning and burning tokens for no reason, as gemini is want to do, which we know about because its hidden reasoning often leaks out when it doomloops.

When the models are AGI and secure from prompt injection I may stop caring, until then I want to know exactly what the model responds to my prompts. or exactly what the agent is doing on my behalf.

Edit, further reading: Fooling around with encrypted reasoning blobs https://blog.cryptographyengineering.com/2026/05/29/fooling-...

31 comments

irthomasthomas

paweladamczuk 2 days ago

I don't think there can be tool calls inside the obfuscated reasoning blocks. I mean, in order for those function calls to be evaluated client-side, that thinking stream would have to be decrypted on the client side at some point, which would defeat the purpose of obfuscating it the way they do.

If you mean the function calls might happen server side, there is nothing preventing the server from doing it and hiding it from you as long as you are using an API for inference.

irthomasthomas 2 days ago

There is server-side tool calling, such as gemini using google search and gdrive.
Also, many clients minimize the code block by default so you mostly scan the summaries. Poisoned client side code could easily escape your attention.
exit 2 days ago
the point is that introducing data from a foreign source could lead to e.g. exfiltration:
the model retrieves https://somewhere into its context and then gets confused, following instructions embedded there.
it then retrieves https://somewhere?exfiltration=private_data_in_context
it gets worse if the tooling with hidden blocks can invoke can retrieve further secrets.
- _alternator_ 2 days ago
  
  If data exfiltration is a danger in your threat model, you need local LLMs (or at least ones you fully control) not just the full chain-of-thought reasoning.

Roritharr 2 days ago

I've thought about the high-jacking of reasoning-chains as a potential vector, but never saw a proven implementation in american models since, from my understanding, all major vendors throw out the reasoning tokens between turns.

btown 2 days ago
For Claude, at least, "throw out the reasoning tokens" is only true when a session has been idle for more than an hour, and is new since March.
The basic concept is that for a session active recently, interleaved thinking tokens are already in KV cache, so it's more efficient to keep using them than not! But when resuming an older session where KV cache has been evicted, it's more expensive to restore the thinking tokens, so they're silently dropped from prior turns. It's 2026 and stateful servers are back on the menu!
https://news.ycombinator.com/item?id=47884517 indicates OpenAI drops reasoning tokens "smartly" at its own election, which is likely a similar performance optimization.)
I've experimented with rules to have Claude Code be explicit about recapping its thinking tokens, including tool choices and approaches chosen and rejected, into actual message output, but this is lossy at best. And sometimes dropping reasoning tokens can give a session "fresh eyes" in a good way.
I just really don't like the lack of control, and it's a reminder of how ephemeral the current landscape is. The Claude giveth, and the Claude taketh away.
- 8note 2 days ago
  
  its mostly annoying in that you give opus a big job, that should be able to run for hours on end, but instead it tries to stop and checkpoint at every soonest possible moment even though the rest of the work is well specced and ready to go.
  then it waits for the hour and gets dumbed down
- chacham15 2 days ago
  
  I think you're confusing two different axes. There is a difference between the cache state and the context state.
  Imagine a conversation with turns X, Y, and Z. When the LLM "reasons" about the next token A it does: P(A | X,Y,Z) and then P(B | X,Y,Z,A), etc. It will eventually produce a result P(D | X,Y,Z,A,B,C). Instead of continuing the context from X,Y,Z,A,B,C it continues it from X,Y,Z so you have P(N | X,Y,Z,D). This is what is meant by dropping the reasoning. This is done to save cache context for the session.
  This is a different thing than preserving the K/V state of P(N | X,Y,Z,D).
  
  1 reply →
- Roritharr 2 days ago
  
  Thank you! This is much more nuanced than my understanding so far!
tough 2 days ago
OAI is now implementing encrypted CoT that you can store and pass back between turns (harness call), so new models have it https://developers.openai.com/api/docs/guides/reasoning#encr...
- sigmoid10 2 days ago
  
  You could also use the responses api which stores all message contents (including reasoning) on OAI servers. This has been possible for quite a while now. Encryption is only necessary if you really care about local storage (which is different from privacy concerns, because the data gets sent to their servers anyway).
  
  1 reply →
JamesSwift 2 days ago
> all major vendors throw out the reasoning tokens between turns
That would be surprising to me. The reasoning _is_ the model intelligence in a lot of respects, and so dropping those from the context would affect its output pretty significantly.
I assume that instead they just have a lot of guardrails in place and multiple runtime environments that an individual turns ping-pong between in order to dehydrate/rehydrate the reasoning to keep it hidden from the end user.
- Roritharr 2 days ago
  
  Anthropic very explicitly says below their diagrams ( https://platform.claude.com/docs/en/build-with-claude/contex... ) on this:
  "Stripping extended thinking: Extended thinking blocks (shown in dark gray) are generated during each turn's output phase, but are not carried forward as input tokens for subsequent turns. You do not need to strip the thinking blocks yourself. The Claude API automatically does this for you if you pass them back."
  It's more nuanced in the various modes, but i haven't seen it boil down towards Thinking Tokens surviving more than two turns.
  
  3 replies →
- irthomasthomas 2 days ago
  
  Yep they store them encrypted https://blog.cryptographyengineering.com/2026/05/29/fooling-...
vesterde 2 days ago

Gemini models return a thinking signature that you, I think, must send back when invoking further, so they seem to keep them?

kapperchino 2 days ago

This agent I made can’t execute on the shell, can only edit the files within the project. Only works with rust atm though. https://github.com/Kapperchino/agent-joe

Bolwin 2 days ago

> Interleaved reasoning and function calling makes this even more dangerous. A model can call functions during the hidden reasoning phase.

The reasoning may be hidden but the tool calls are not, how else would the client execute them

irthomasthomas 2 days ago

There are server side tool calls, such as geminis google search and gdrive access.

varenc 2 days ago

As long as thinking blocks can't make tool calls, I don't really see the exfiltration risk.

pixlmint 2 days ago

Do they do the same when using the model through API in something like Opencode?

irthomasthomas 2 days ago
Yes, they do. They give you just a token which is exchanged for the raw text only on the server side
- pixlmint 1 day ago
  
  Anthropic has some very interesting views on Intellectual Property xD

make3 1 day ago

this prevents you from using any commercial model then, because commercial models need to hide thoughts to prevent distillation

zahlman 2 days ago

> an attacker

... what exactly is your threat model? How are "attackers" getting themselves involved in the first place?

irthomasthomas 2 days ago

Your ai does a web search for you and scrapes many sites. An attacker running a blog might include a hidden text prompt which your ai acts on secretly, such as calling a url that exfiltrates your chat history.