Comment by seanmcdirmid

2 months ago

Ok, I’ll bite: how is that different from humans?

52 comments

seanmcdirmid

Human behaviour is goal-directed because humans have executive function. When you turn off executive function by going to sleep, your brain will spit out dreams. Dream logic is famous for being plausible but unhinged.

I have the feeling that LLMs are effectively running on dream logic, and everything we've done to make them reason properly is insufficient to bring them up to human level.

seanmcdirmid 2 months ago
Isn’t a modern LLM with thinking tokens fairly goal directed? But yes, we hallucinate in our sleep while LLMs will hallucinate details if the prompt isn’t grounded enough.
- zarzavat 2 months ago
  
  The thing about dream logic is that it can be a completely rational series of steps, but there's usually a giant plot hole which you only realise the second you wake up.
  This definitely matches my experience of talking to AI agents and chatbots. They can be extremely knowledgeable on arcane matters yet need to have obvious (to humans) assumptions pointed out to them, since they only have book smarts and not street smarts.
- tovej 2 months ago
  
  Assuming this is not a rhetorical question: no, it is not. The only "goal" is to maximize plausibility.
  
  7 replies →
tsunamifury 2 months ago
It’s amazing how much you get wrong here. As LLM attention layers are stacked goal functions.
What they lack is multi turn long walk goal functions — which is being solved to some degree by agents.
- strken 2 months ago
  
  I don't argue that thinking and attention are missing. I argue that they are trying to do the job of human executive function but aren't as good at it.
nemo44x 2 months ago
LLMs are literally goal machines. It’s all they do. So it’s important that you input specific goals for them to work towards. It’s also why logically you want to break the problem into many small problems with concrete goals.
- andai 2 months ago
  
  Do you only mean instruct-tuned LLMs? Or the base (pretrained) model too?
  
  1 reply →
satvikpendem 2 months ago
A prompt for an LLM is also a goal direction and it'll produce code towards that goal. In the end, it's the human directing it, and the AI is a tool whose code needs review, same as it always has been.
- basch 2 months ago
  
  Id argue humans have some sort of parallelness going on that machines dont yet. Thoughts happening at multiple abstraction levels simultaneously. As I am doing something, I am also running the continuous improvement cycle in my head, at all four steps concurrently. Is this working, is this the right direction, does this validate?
  You could build layers and layers of LLMs watching the output of each others thoughts and offering different commentary as they go, folding all the thoughts back together at the end. Currently, a group of agents acts more like a discussion than something somewhat omnipotent or omnitemporal.
whoamii 2 months ago

Some of my best code comes from my dreams though.
spiderfarmer 2 months ago
And yet LLM’s are incredibly useful as they are right now.
- strken 2 months ago
  
  And yet they're going to be better in a decade, which will require understanding why they aren't perfect today.

apical_dendrite 2 months ago

The volume is different. Someone submitted a PR this week that was 3800 lines of shell script. Most of it was crap and none of it should have been in shell script. He's submitting PRs with thousands of lines of code every day. He has no idea how any of it actually works, and it completely overwhelms my ability to review.

Sure, he could have submitted a ill-considered 3800 line PR five years ago, but it would have taken him at least a week and there probably would have been opportunities to submit smaller chunks along the way or discuss the approach.

switchbak 2 months ago

It’s harder when the person doing what you describe has the ability to have you fired. Power asymmetry + irresponsible AI use + no accountability = a recipe for a code base going right to hell in a few months.
I think we’re going to see a lot of the systems we depend on fail a lot more often. You’d often see an ATM or flight staus screen have a BSOD - I think we’re going to see that kind of thing everywhere soon.
satvikpendem 2 months ago

Just block that user, that seems to be the way.

somewhereoutth 2 months ago

Humans have a 'world model' beyond the syntax - for code, an idea of what the code should do and how it does it. Of course, some humans are better than others at this, they are recognized as good programmers.

satvikpendem 2 months ago
Papers show that AI also has a world model, so I don't think that's the right distinction.
- tovej 2 months ago
  
  Could you please cite these papers. If by AI you mean LLMs, that is not supported by what I know. If you mean a theoretical world-model-based AI, that's just a tautological statement.
  
  11 replies →

detourdog 2 months ago

What I'm surprises me about the current development environment is the acceleration of technical debt. When I was developing my skills the nagging feeling that I didn't quite understand the technology was a big dark cloud. I felt this clopud was technical debt. This was always what I was working against.

I see current expectations that technical debt doesn't matter. The current tools embrace superficial understand. These tools to paper over the debt. There is no need for deeper understanding of the problem or solution. The tools take care of it behind the scenes.

wood_spirit 2 months ago

It’s not. LLMs are just averaging their internet snapshot, after all.

But people want an AI that is objective and right. HN is where people who know the distinction hang out, but it’s not what the layperson things they are getting when they use this miraculous super hyped tool that everybody is raving about?

mrwh 2 months ago
The etiquette, even at the bigtech place I work, has changed so quickly. The idea that it would be _embarrassing_ to send a code review with obvious or even subtle errors is disappearing. More work is being put on the reviewer. Which might even be fine if we made the further change that _credit goes to the reviewer_. But if anything we're heading in the opposite direction, lines of code pumped out as the criterion of success. It's like a car company that touts how _much_ gas its cars use, not how little.
- wood_spirit 2 months ago
  
  Review is usually delegated to an AI too
satvikpendem 2 months ago
By now, a few years after ChatGPT released, I don't think anyone is thinking AI is objective and right, all users have seen at least one instance of hallucination and simply being wrong.
- wood_spirit 2 months ago
  
  Sorry I can think of so many counter examples. I also detect a lot of “well it hallucinates about subject X (that the person knows well, so can spot the hallucination)” but continue to trust it on subjects Y and Z (which the person knows less well so can’t spot the hallucinations).
  YMMV.
  
  5 replies →
seanmcdirmid 2 months ago

There are a lot of binary thinkers on HN, but they shouldn’t make up a majority.

rDr4g0n 2 months ago

It's much easier to fire an employee which produces low quality/effort work than to convince leadership to fire Claude.

satvikpendem 2 months ago

You can fire employees who don't review code generated though, because ultimately it's their responsibility to own their code, whether they hand wrote it or an LLM did.
It seems to me that it's all a matter of company culture, as it has always been, not AI. Those that tolerate bad code will continue to tolerate it, at their peril.