Comment by ujjwalreddyks

18 days ago

Thanks for checking this out! 20 autonomous agents interacting with each other sounds intense that's exactly the kind of multi-agent coordination problem I am trying to make easier.

On the weights (70/20/10 for capability/latency/cost):

Honestly, those were empirically tuned from my own usage patterns. Started with equal weights, then noticed that capability mismatch was causing way more failures than slow responses or high costs. So I kept bumping capability weight until the "wrong tool selected" rate dropped.

You're spot on about task-type sensitivity though. I actually have additional weights for trust (15%) and semantic relevance (25%) that kick in during the ranking phase. But dynamic weight adjustment per task type is on the roadmap.

The idea would be something like:

- "real-time" or "live" in query → boost latency weight to 40% - "cheap" or "budget" in query → boost cost weight to 30% - "accurate" or "reliable" in query → boost trust weight to 25%

Haven't shipped it yet because I wanted to validate the static weights first. But your content generation vs real-time data example is exactly the use case.

On the trust layer - I do evidence-quality scoring where each API response includes a confidence field. APIs that return citations or source URLs get a trust boost. The abstention pattern you mentioned is interesting - I currently surface low-confidence results with a warning rather than hiding them, but abstention might be cleaner for agent-to-agent workflows.

Would love to hear more about how you handle trust scoring in BoTTube. Always looking for battle-tested patterns.