Comment by zambelli

21 days ago

Nice symmetry with tool call failures being sent to LLM that made the call without bugging the user. The artifact-generating entity gets the error back, effectively.

100% correct, and stackable. Could have topic refusal in LLM training itself, forge in tool call alter, and sdlc gates at the workflow level.

3 comments

zambelli

mrothroc 20 days ago

Definitely stacks. The thing that made it clear for me was being explicit about the stages, and where/what you can verify with a guardrail, or gate. I wrote up the framework I use here: https://michael.roth.rocks/research/trust-topology/

Being explicit about the space between the stages is critical, because that's your enforcement point.

zambelli 20 days ago
This is a really neat writeup, and the empirical data for coding agents is super useful. Will take a closer read and see if there's anything I easily lift into my harness!
- mrothroc 19 days ago
  
  Thanks, glad you find it useful! Feel free to ping me if you have any questions.