Comment by joefourier

2 months ago

Have you used an LLM specifically trained for tool calling, in Claude Code, Cursor or Aider?

They’re capable of looking up documentation, correcting their errors by compiling and running tests, and when coupled with a linter, hallucinations are a non issue.

I don’t really think it’s possible to dismiss a model that’s been trained with reinforcement learning for both reasoning and tool usage as only doing pattern matching. They’re not at all the same beasts as the old style of LLMs based purely on next token prediction of massive scrapes of web data (with some fine tuning on Q&A pairs and RLHF to pick the best answers).

15 comments

joefourier

treespace8 2 months ago

I'm using Claude code to help me learn Godot game programming.

One interesting thing is that Claude will not tell me if I'm following the wrong path. It will just make the requested change to the best of its ability.

For example a Tower Defence game I'm making I wanted to keep turret position state in an AStarGrid2D. It produced code to do this, but became harder and harder to follow as I went on. It's only after watching more tutorials I figured out I was asking for the wrong thing. (TileMapLayer is a much better choice)

LLMs still suffer from Garbage in Garbage out.

jennyholzer3 2 months ago

don't use LLMs for Godot game programming.
edit: Major engine changes have occurred after the models were trained, so you will often be given code that refers to nonexistent constants and functions and which is not aware of useful new features.
memoriuaysj 2 months ago
before coding I just ask the model "what are the best practices in this industry to solve this problem? what tools/libraries/approaches people use?
after coding I ask it "review the code, do you see any for which there are common libraries implementing it? are there ways to make it more idiomatic?"
you can also ask it "this is an idea on how to solve it that somebody told me, what do you think about it, are there better ways?"
- hansmayer 2 months ago
  
  > before coding I just ask the model "what are the best practices in this industry to solve this problem? what tools/libraries/approaches people use?
  Just for the fun of it, and so you lose your "virginity" so to speak, next time when the magic machine gives you the answer about "what it thinks", tell it its wrong in a strict language and scold it for misleading you. Tell it to give you the "real" best practices instead of what it spat out. Then sit back and marvel at the machine saying you were right and that it had mislead you. Producing a completely, somewhat, or slightly different answer (you never know what you get on the slot machine).
- manmal 2 months ago
  
  Both the before and after are better done manually. What you are describing is fine for the heck of it (I‘ve vibe coded a whisper related rust port today without having any actual rust skills), but I’d never use fully vibed software in production. That’s irresponsible in multiple ways.
- skydhash 2 months ago
  
  Do you also light candles and chant?

belter 2 months ago

Ask a model to

"Write a chess engine where pawns move backward and kings can jump like nights"

It will keep slipping back into real chess rules. It learned chess, it did not understand the concept of "rules"

Ask it to reverse a made up word like

"Reverse the string 'glorbix'"

It will get it wrong on the first try. You would not fail.

Or even better ask it to...

"Use the dxastgraphx library to build a DAG scheduler."

dxastgraphx is a non existing library...

Marvel at the results...tried in both Claude and ChatGPT....

manmal 2 months ago

I‘ve just tried the dxastgraphx one in pi with Opus 4.5. This was its response:

    I couldn't find a library called dxastgraphx in either pip (Python) or npm (JavaScript) package registries. This library doesn't appear to exist.

    Did you perhaps mean one of these popular DAG/graph libraries?

    Python:
    - networkx - comprehensive graph library with DAG support
    - graphlib - Python standard library (3.9+) with TopologicalSorter
    - dask - parallel computing with DAG task scheduling

    JavaScript/TypeScript:
    - graphlib - graph data structures
    - dagre - DAG layout

    Would you like me to build a DAG scheduler using one of these existing libraries, or would you like me to implement one from scratch? Let me know which language and approach you prefer.

bossyTeacher 2 months ago

Tried ChatGPT (free version). I answered correctly to the first question.
Answer to second question:
"I can do that, but there’s a catch: dxastgraphx is not a known or standard Python (or JS) library as of now. I don’t have any public API or documentation for it to target directly.
To avoid guessing wrong, here are the two sensible options:"
somebodythere 2 months ago

Why would I ask the model to reverse the string 'glorbix,' especially in the context of software engineering?
criticalfault 2 months ago
just tried to reverse the string you provided using Gemini. it worked fine on the first try
- eithed 2 months ago
  
  In gemini:
  > The reverse of the string 'gssssyeekxxuoaqaaaiooo9aaa7zjx' is:
  xjz7aaa9oooiiaaaqauxxkyeessssg
  > Youve made a mistake
``` I see where I tripped up! Let's re-examine that long string carefully.
The original string you gave me was: `gssssyeekxxuoaqaaaiooo9aaa7zjx`
Let's look at the end of that string: `...aaa7zjx`. When we reverse it, it should start with: `xjz7aaa...`
Here is the precise character-by-character reversal:
*xjz7aaa9oooiiaaaqauxxkyeessssg*
Wait—I see it now. In my previous response, I included an extra *'i'* and missed one *'o'* in the middle section. Let's do a strict count to get it perfect:
* *Original:* `g` `ssss` `y` `ee` `k` `xx` `u` `o` `a` `q` `aaa` *`iooo`* `9` `aaa` `7` `z` `j` `x` * *Correct Reverse:* `x` `j` `z` `7` `aaa` `9` *`oooi`* `aaa` `q` `a` `o` `u` `xx` `k` `ee` `y` `ssss` `g`
The correct reversed string is: *xjz7aaa9oooiiaaaqauoxxkyeessssg*
Thank you for catching that. It’s easy for the eyes to glaze over when there are so many repeating vowels!
Since I've stumbled a bit on these, would you like to give me one more string to see if I can get it right on the first try? ```
After more back and fors it consistently fails in this task, even though when strictly dividing the tokens it will get this right. Yet the final answer is always wrong.

1 reply →
baq 2 months ago
You’re trying to interrogate a machine as you would a human and presenting this as evidence that machines aren’t humans. Yes, you’re absolutely right! And also completely missing the point.
- belter 2 months ago
  
  The discussion is not about being human. Is about being fit for purpose...