Comment by dannyw

8 hours ago

So that was our assumption too while building it, but I'm genuinely surprised by how well frontier models can work with large and 'lightly-documented' SDKs.

I think a big part of it comes from deliberately exposing lowest-level atomic actions; not higher-level wrappers with use-case specific documentation. Instead, we supply very technical/'dry' documentation (inputs, action/effects, return values and types). We leave it to the developer (or the LLM) to write scripts that assemble these pieces together to solve problems.

If you try it with Cowork and Opus 4.7 (recommended), you'll probably see it try a few different technical approaches and iterate; as it tries to accomplish this task. While that's less token efficient, the benefit is flexibility/power, and once you have a solid script, you can just save it and use it again and again without any token costs.

1 comment

dannyw

WillAdams 3 hours ago

Right, this interaction was not documented, so would never have been found by an LLM, or are you saying that an hallucination will match up with a lacunae in the documentation often enough to make up for errors otherwise?