Comment by P-MATRIX

8 days ago

The skepticism makes sense to me. The core issue isn't wrong outputs—it's that there's no standard way to see what the agent was actually doing when it produced them. Without some structured view of tool call patterns, norm deviations, behavioral drift, verification stays manual and expensive. The non-determinism problem and the observability problem feel like the same problem to me.