← Back to context

Comment by anupshinde

16 days ago

Just yesterday I had a moment

Claude's code in a conversation said - “Yes. I just looked at tag names and sorted them by gut feeling into buckets. No systematic reasoning behind it.”

It has gut feelings now? I confronted for a minute - but pulled out. I walked away from my desk for an hour to not get pulled into the AInsanity.

>It has gut feelings now?

I would say hard no. It doesn't. But it's been trained on humans saying that in explaining their behavior, so that is "reasonable" text to generate and spit out at you. It has no concept of the idea that a human-serving language model should not be saying it to a human because it's not a useful answer. It doesn't know that it's not a useful answer. It knows that based on the language its been trained on that's a "reasonable" (in terms of matrix math, not actual reasoning) response.

Way too many people think that it's really thinking and I don't think that most of them are. My abstract understanding is that they're basically still upjumped Markov chains.

It has a lot. I find by challenging it often, getting it to explain it's assumptions, it's usually guessing.

This can be overcome by continuously asking it to justify everything, but even then...

  • Trust shouldn't be inherent in our adoption of these models.

    However, constant skepticism is an interesting habit to develop.

    I agree, continually asking it to justify may seem tiresome, especially if there's a deadline. Though with less pressure, "slow is smooth...".

    Just this evening, a model gave an example of 2 different things with a supposed syntax difference, with no discernible syntax difference to my eyes.

    While prompting for a 'sanity check', the model relented: "oops, my bad; i copied the same line twice". smh

    • I don't find it tiresome at all. What I was getting at was, even with constant justifications you need to remain vigilant.

  • It's almost like an emergent feature of a tool that's literally built on best guesses is...guesswork. Not what you want out of a tool that's supposed to be replacing professionals!

    • Interesting perspective.

      I guess I'm more interested in understanding what it can and can't do.