← Back to context

Comment by Imanari

21 days ago

Very cool work! Regarding your finding "the tool ran successfully and returned data" and "the tool ran successfully but found nothing." Couldn’t this be solved by designing better tool responses instead of adding another layer in between? Just curious and probing my understanding.

100%, a better tool would work or even remove the problem overall.

The isssue/use-case is more around, say, a database table or legacy systems where your tool is just hitting a legacy API that may or may not be good. A surface you don't control.

It didn't come up as a use-case in this eval honestly, it's more the concept of a standard, like 4xx vs 5xx. I just felt it was missing from the ecosystem overall.