My general understanding of the concenus on most models these days is that people consider google models to be some of the worst at tool calling, so certainly an interesting choice. Did you do any evals on this?
Hi, would love to know where you get that impression on 1 shot tool calling, was there concrete evaluation carried out? pretty new to this and was a bit lost when trying to compare models on different capabilities.
My general understanding of the concenus on most models these days is that people consider google models to be some of the worst at tool calling, so certainly an interesting choice. Did you do any evals on this?
Hi, would love to know where you get that impression on 1 shot tool calling, was there concrete evaluation carried out? pretty new to this and was a bit lost when trying to compare models on different capabilities.