← Back to context

Comment by strongly-typed

14 hours ago

To me it sounds like a tooling problem. OP seems to be trying to use probabilistic text systems as if they enforce rules, but rule enforcement should really live outside the model. My sense is that there was a failure to verify the agent's intent.

The tooling that invokes the model should really define some kind of guardrails. I feel like there's an analogy to be had here with the difference between an untyped program and a typed program. The typed program has external guardrails that get checked by an external system (the compiler's type checker).

What tooling? It's a probabilistic text generator that runs in a black box on the provider's server. What tooling will have which guardrails to make sure that these scattered markdown files are properly injected and used in the text generation?

  • That's the million dollar question. Maybe have systems of agents that all validate each other's work? Maybe something needs to be done at the harness level? I don't suppose that we could realistically expect 100% accuracy, but if we take 100% to be the upper limit, we could build systems that get us closer to that ideal.

    • This is faith in magic. "There's some magic way to make probabilistic text generator running in the cloud to never miss local files"