Comment by dudeinhawaii

1 day ago

This isn't my language (saw it on a youtube video but agree with it) -- LLMs are not calculators. It's as simple as that.

If the LLM can complete the task using tools, then it's a pass.

Apples team went out of their way to select tests that LLMs would struggle with and then take away tools -- and then have the audacity to write that they're surprised at the outcome. Who would be surprised? No one using AI since GPT-4 expects them to be calculators or algorithm executors.

You want the LLM to be smart enough to realize "I can't do this without tools", grab the tool, use it correctly, and give you the actual correct answer. Preventing LLMs from using tools or writing and executing code -- then you're intentionally crippling them.

1 comment

dudeinhawaii

janalsncm 1 day ago

I think that’s perfectly reasonable for problems that have already been solved and for which tools already exist. But there are a lot of problems for which tools don’t exist and will need to be developed.

In other words, being able to go to the produce aisle means I don’t need to know how to farm, but it also doesn’t make me a farmer.