Comment by mattmanser

2 months ago

Ok, if you're a senior dev, have you 'caught' it yet?

Ask it a question about something you know well, and it'll give you garbage code that it's obviously copied from an answer on SO from 10 years ago.

When you ask it for research, it's still giving you garbage out of date information it copied from SO 10 years ago, you just don't know it's garbage.

6 comments

mattmanser

theshrike79 2 months ago

That's why you dont use LLMs as a knowledge source without giving them tools.

"Agents use tools in a loop to achieve a goal."

If you don't give any tools, you get hallucinations and half-truths.

But you give one a tool to do, say, web searches and it's going to be a lot smarter. That's where 90% of the innovation with "AI" today is coming from. The raw models aren't gettin that much smarter anymore, but the scaffolding and frameworks around them are.

Tools are the main reason Claude Code is as good as it is compared to the competition.

andrekandre 2 months ago
> The raw models aren't gettin that much smarter anymore, but the scaffolding and frameworks around them are.
yes, that is my understanding as well, though it gets me thinking if that is true, then what real value is the llm on the server compared to doing that locally + tools?
- theshrike79 2 months ago
  
  You still can't beat an acre of specialized compute with any kind of home hardware. That's pretty much the power of cloud LLMs.
  For a tool use loop local models are getting to "OK" levels, when they get to "pretty good", most of my own stuff can run locally, basically just coordinating tool calls.

jmogly 2 months ago

Of course, step one is always critically think and evaluate for bad information. I think for research, I mainly use it for things that are testable/verifiable, for example I used it for a tricky proxy chain set up. I did try to use it to learn a language a few months ago which I think was counter productive for the reasons you mentioned.

mattmanser 2 months ago

How can you critically assess something in a field you're not already an expert on?
That Python you just got might look good, but could be rewritten from 50 lines to 5, it's written in 2010-style, it's not using modern libraries, it's not using modern syntax.
And it is 50 to 5. That is the scale we're talking about in a good 75% of AI produced code unless you challenge it constantly. Not using modern syntax to reduce boilerplate, over-guarding against impossible state, ridiculous amounts of error handling. It is basically a junior dev on steriods.
Most of the time you have no idea that most of that code is totally unnecessary unless you're already an expert in that language AND libraries it's using. And you're rarely an expert in both or you wouldn't even be asking as it would have been quicker to write the code than even write the prompt for the AI.

skydhash 2 months ago

I use web search (DDG) and I don’t think I have ever try more than one queries in the vast majority of cases. Why because I know where the answer is, I’m using the search engine as an index to where I can find it. Like “csv python” to find that page in the doc.