Comment by zambelli

21 days ago

Oh, interesting idea. Formalizing an abstraction layer for testing all the integration types out there in the AI ether, essentially? MCP, skills, etc.

I think this sits a level higher than Forge - maybe testing the workflow proper and integration points that it might surface (if some tools are giving access to an MCP or something).

Could likely layer both together without much trouble.

Only thing I'd be curious about is how you handle the non-deterministic nature of these models. Sometimes they get the tool call right, sometimes they barf bad json. Does the suite run multiple trials?

1 comment

zambelli

deevus 21 days ago

[dead]