Comment by citizenpaul

2 months ago

Its really a high level bikeshed. Obviously we are all still using and experimenting with LLM's. However there is a huge gap of experiences and total usefulness depending on the exact task.

The majority of HN's still reach for LLM's pretty regularly even if they fail horribly frequently. Thats really the pit the tech is stuck in. Sometimes it oneshots your answer perfectly, or pair programs with you perfectly for one task, or notices a bug you didn't. Sometimes it wastes hours of your time for various subtle reasons. Sometimes it adamantly insists 2 + 2 = 55

5 comments

citizenpaul

nfw2 2 months ago

Latest reasoning models don't claim 2 + 2 = 55, and it's hard to find them making an sort of obviously false claims, or not admitting to being mistaken if you point out that they are

taormina 2 months ago
I can’t go a full a full conversation without obviously false claims. They will insist you are correct and that your correction is completely correct despite that also being wrong.
- nfw2 2 months ago
  
  Ironically the start of this thread was bemoaning the use of anecdotal evidence
  
  1 reply →
citizenpaul 2 months ago

It was clearly a simplified example, like I said endless bikeshed.
Here is a real one. I was using the much lauded new Gemini 3? last week and wanted it to do something a slightly specific way for reasons. I told it specifically and added it to the instructions. DO NOT USE FUNCTION ABC.
It immediately used FUNCTION ABC. I asked it to read back its instructions to me. It confirmed what I put there. So I asked it again to change it to another function. It told me that FUNCTION ABC was not in the code, even though it was clearly right there in the code.
I did a bit more prodding and it adamantly insisted that the code it generated did not exist, again and again and again. Yes I tried reversing to USE FUNCTION XYZ. Still wanted to use ABC