Comment by mattmanser
11 hours ago
Ok, if you're a senior dev, have you 'caught' it yet?
Ask it a question about something you know well, and it'll give you garbage code that it's obviously copied from an answer on SO from 10 years ago.
When you ask it for research, it's still giving you garbage out of date information it copied from SO 10 years ago, you just don't know it's garbage.
That's why you dont use LLMs as a knowledge source without giving them tools.
"Agents use tools in a loop to achieve a goal."
If you don't give any tools, you get hallucinations and half-truths.
But you give one a tool to do, say, web searches and it's going to be a lot smarter. That's where 90% of the innovation with "AI" today is coming from. The raw models aren't gettin that much smarter anymore, but the scaffolding and frameworks around them are.
Tools are the main reason Claude Code is as good as it is compared to the competition.
yes, that is my understanding as well, though it gets me thinking if that is true, then what real value is the llm on the server compared to doing that locally + tools?
You still can't beat an acre of specialized compute with any kind of home hardware. That's pretty much the power of cloud LLMs.
For a tool use loop local models are getting to "OK" levels, when they get to "pretty good", most of my own stuff can run locally, basically just coordinating tool calls.
Of course, step one is always critically think and evaluate for bad information. I think for research, I mainly use it for things that are testable/verifiable, for example I used it for a tricky proxy chain set up. I did try to use it to learn a language a few months ago which I think was counter productive for the reasons you mentioned.
I use web search (DDG) and I don’t think I have ever try more than one queries in the vast majority of cases. Why because I know where the answer is, I’m using the search engine as an index to where I can find it. Like “csv python” to find that page in the doc.