Comment by adeebvaliulla

5 months ago

That makes a lot of sense, and I like that you’re being explicit about regret minimization rather than chasing local optima.

The Thompson Sampling + Wilson score combo is a pragmatic choice. In practice, most agent systems I see fail not because they lack metrics, but because they overreact to them. Noisy reward signals plus greedy selection is how teams end up whipsawing configs or freezing change altogether. Treating uncertainty as a first-class input instead of something to smooth away is the right move.

I also agree with your point on attribution. Perfect attribution is a trap. In real production environments, partial and imperfect outcome signals still dominate static configs if the system can reason probabilistically over time. This mirrors what we learned in reliability and delivery metrics years ago: trend dominance beats point accuracy.

One area I’d be curious about as this matures is organizational adoption rather than the math:

- How teams reason about defining outcomes without turning it into a governance bottleneck

- How you help users build intuition around uncertainty and regret so they trust the system when it routes “away” from what feels intuitively right

- Where humans still need to intervene, if anywhere, once the control plane is established

If this holds up across long-tail tasks and low-frequency failures, it feels like a real step toward agents that behave more like adaptive systems and less like fragile workflows with LLMs bolted on.