Comment by steve_adams_86

3 days ago

This is why I think harnesses should have more assertive layers of control and constraint. So much of what Claude does now is purely context-derived (like skills) and I plain old don't see that as the future. It's highly convenient that it works—kind of amazing really—but the stop hook should literally stop the LLM in its tracks, and we should normalize this kind of control structure around non-deterministic systems.

The thing is, making everything context means our systems can be extremely fluid and language-driven, which means tool developers can do a lot more, a lot faster. It's a number go up thing, in my opinion. We could make better harnesses with stricter controls, but we wouldn't build things like Claude Code as quickly.

The skills and plugins conventions weird me out so much. So much text and so little meaningful control.

3 comments

steve_adams_86

niyikiza 3 days ago

>>harnesses should have more assertive layers of control and constraint

Been saying this for a while and mostly getting blank stares. In-context "controls" as the primary safety mechanism is going to be a bitter lesson for our industry. What you want is a deterministic check outside the model's reasoning that decides allow/deny without consulting its opinion. Cryptographic if the record needs to survive a compromised orchestrator, and open source. If your control is a string the model can read, the model can ignore it. If it can write it, it can forge it. I'm surprised how strange that idea sounds to some people.

Disclosure: I'm working on an open source authorization tool for agents.

steve_adams_86 3 days ago
> I'm surprised how strange that idea sounds to some people.
I think a lot of people using the models genuinely feel like the models are more capable than they are now, and they're content to relinquish a lot of trust and agency. The worrying thing is that the models are superficially hyper-capable, but from more granular perspectives, you can see a lot of holes in their abilities. This is incredibly important, but very difficult to convey concisely to people. It's a classic example of nuance seeming too complicated because not caring is so much more gratifying. People love using these models.
- niyikiza 2 days ago
  
  Yeah, people calibrate trust to the median behaviour of the model and get burned by the tail. What makes it harder is that even people who do see the holes often respond with better prompts and more elaborate context. Same trust-the-model move one level up. Hyperscalers aren't incentivized to fight that instinct either. Every "fix" routes more tokens through their meter.