Comment by belter

2 months ago

>> From where I’m standing, it’s scary.

You are being fooled by randomness [1]

Not because the models are random, but because you are mistaking a massive combinatorial search over seen patterns for genuine reasoning. Taleb point was about confusing luck for skill. Dont confuse interpolation for understanding.

You can read a Rust book after years of Java, then go build software for an industry that did not exist when you started. Ask any LLM to write a driver for hardware that shipped last month, or model a regulatory framework that just passed... It will confidently hallucinate. You will figure it out. That is the difference between pattern matching and understanding.

[1] https://en.wikipedia.org/wiki/Fooled_by_Randomness

19 comments

belter

Verdex 2 months ago

I've worked with a lot of interns, fresh outs from college, overseas lowest bidders, and mediocre engineers who gave up years ago. All over the course of a ~20 year career.

Not once in all that time has anyone PRed and merged my completely unrelated and unfinished branch into main. Except a few weeks ago. By someone who was using the LLM to make PRs.

He didn't understand when I asked him about it and was baffled as to how it happened.

Really annoying, but I got significantly less concerned about the future of human software engineering after that.

joefourier 2 months ago

Have you used an LLM specifically trained for tool calling, in Claude Code, Cursor or Aider?

They’re capable of looking up documentation, correcting their errors by compiling and running tests, and when coupled with a linter, hallucinations are a non issue.

I don’t really think it’s possible to dismiss a model that’s been trained with reinforcement learning for both reasoning and tool usage as only doing pattern matching. They’re not at all the same beasts as the old style of LLMs based purely on next token prediction of massive scrapes of web data (with some fine tuning on Q&A pairs and RLHF to pick the best answers).

treespace8 2 months ago
I'm using Claude code to help me learn Godot game programming.
One interesting thing is that Claude will not tell me if I'm following the wrong path. It will just make the requested change to the best of its ability.
For example a Tower Defence game I'm making I wanted to keep turret position state in an AStarGrid2D. It produced code to do this, but became harder and harder to follow as I went on. It's only after watching more tutorials I figured out I was asking for the wrong thing. (TileMapLayer is a much better choice)
LLMs still suffer from Garbage in Garbage out.
- jennyholzer3 2 months ago
  
  don't use LLMs for Godot game programming.
  edit: Major engine changes have occurred after the models were trained, so you will often be given code that refers to nonexistent constants and functions and which is not aware of useful new features.
- memoriuaysj 2 months ago
  
  before coding I just ask the model "what are the best practices in this industry to solve this problem? what tools/libraries/approaches people use?
  after coding I ask it "review the code, do you see any for which there are common libraries implementing it? are there ways to make it more idiomatic?"
  you can also ask it "this is an idea on how to solve it that somebody told me, what do you think about it, are there better ways?"
  
  3 replies →
belter 2 months ago
Ask a model to
"Write a chess engine where pawns move backward and kings can jump like nights"
It will keep slipping back into real chess rules. It learned chess, it did not understand the concept of "rules"
Or
Ask it to reverse a made up word like
"Reverse the string 'glorbix'"
It will get it wrong on the first try. You would not fail.
Or even better ask it to...
"Use the dxastgraphx library to build a DAG scheduler."
dxastgraphx is a non existing library...
Marvel at the results...tried in both Claude and ChatGPT....
- manmal 2 months ago
  
  I‘ve just tried the dxastgraphx one in pi with Opus 4.5. This was its response:
  I couldn't find a library called dxastgraphx in either pip (Python) or npm (JavaScript) package registries. This library doesn't appear to exist. Did you perhaps mean one of these popular DAG/graph libraries? Python: - networkx - comprehensive graph library with DAG support - graphlib - Python standard library (3.9+) with TopologicalSorter - dask - parallel computing with DAG task scheduling JavaScript/TypeScript: - graphlib - graph data structures - dagre - DAG layout Would you like me to build a DAG scheduler using one of these existing libraries, or would you like me to implement one from scratch? Let me know which language and approach you prefer.
- bossyTeacher 2 months ago
  
  Tried ChatGPT (free version). I answered correctly to the first question.
  Answer to second question:
  "I can do that, but there’s a catch: dxastgraphx is not a known or standard Python (or JS) library as of now. I don’t have any public API or documentation for it to target directly.
  To avoid guessing wrong, here are the two sensible options:"
- somebodythere 2 months ago
  
  Why would I ask the model to reverse the string 'glorbix,' especially in the context of software engineering?
- criticalfault 2 months ago
  
  just tried to reverse the string you provided using Gemini. it worked fine on the first try
  
  2 replies →
- baq 2 months ago
  
  You’re trying to interrogate a machine as you would a human and presenting this as evidence that machines aren’t humans. Yes, you’re absolutely right! And also completely missing the point.
  
  1 reply →

doug_durham 2 months ago

Why would you expect an LLM or even a human to succeed in these cases? “Write a piece of code for a specification that you can’t possibly know about?” That’s why you have to do context engineering, just like you’d provide a reference to a new document to an engineer writing code.

germandiago 2 months ago

This is exactly what happened to me: novel or uncommon = hallucinate or invent wrong.

It is ok for getting snippets for example and saying (I did it). Please make this MVVM style. It is not perfect, but saves time.

For very broad or novel reasoning, as of today... forget it.