Comment by mrothroc

21 days ago

Yes, "guardrails" is a squishy term. But it gets clearer if you ask what transition is being guarded.

Some of this is inside the model, like topic refusals. Forge sits at the tool call level.

My personal workflow uses guardrails at the SDLC level: I have a standard pipeline (plan, design, code, build, test). I use gates between each stage, and the right composition leads to a much higher quality in the final product.

Also worth mentioning that gate failures are given to the agent that produced the artifact, so it has a chance to fix it. That means that I don't have to review obviously wrong output.

Nice symmetry with tool call failures being sent to LLM that made the call without bugging the user. The artifact-generating entity gets the error back, effectively.

100% correct, and stackable. Could have topic refusal in LLM training itself, forge in tool call alter, and sdlc gates at the workflow level.

  • Definitely stacks. The thing that made it clear for me was being explicit about the stages, and where/what you can verify with a guardrail, or gate. I wrote up the framework I use here: https://michael.roth.rocks/research/trust-topology/

    Being explicit about the space between the stages is critical, because that's your enforcement point.

    • This is a really neat writeup, and the empirical data for coding agents is super useful. Will take a closer read and see if there's anything I easily lift into my harness!

      1 reply →