← Back to context

Comment by simianwords

5 hours ago

If it bullshits so much, you wouldn't have a problem giving me an example of it bullshitting on ChatGPT (paid version)? Lets take any example of a text prompt fitting a few pages - it may be a question in science or math or any domain. Can you get it to bullshit?

I like to let new models write a few lines of Latin poetry - they rarely get the meter right.

I don't have access to paid ChatGPT right now, but here's Opus 4.6 with extra thinking enabled: https://claude.ai/share/6e0e8ef5-06e4-4514-ba7e-299357c1fc55

The initial draft fucks up the meter in lines 3 and 8, the final version gets line 2 wrong ("venit meis") and is somewhat obnoxious with verses 2 and 8 basically repeating each other. The thinking trace is useless and gives us no clue why the model exchanged a bland, but metrically correct first distich for a more interesting, but metrically incorrect one.

In fact, the "careful" examination of its own output completely skips the erroneously modified half-verse in line 2 - now, tell me that's a coincidence and not a sign of bullshitting.

I think you highlight one of the problems with users of LLMs: You can't tell anymore if it is BS or not.

I caught Claude the other day hallucinating code that was not only wrong, but dangerously wrong, leading to tasks being failed and never recover. But it certainly wasn't obvious.

To me it’s the other way around. It’s difficult to trust (paid) ChatGPT‘s output consistently.

When I need exact, especially up to date facts, I have to constantly double check everything.

I split my sessions into projects by topic, it regularly mixes things up in subtle and not so subtle ways. There is no sense of actually understanding continuity and especially not causality it seems.

It’s _very_ easy to lead it astray and to confidently echo false assumptions.

In any case, I‘ve become more precise at prompting and good at spotting when it fails. I think the trick is to not take its output too seriously.

> If it bullshits so much, you wouldn't have a problem giving me an example of it bullshitting on ChatGPT (paid version)?

There's an entire paragraph in the essay about apyhr's direct experience with ChatGPT failures and sustained bullshitting that we'd never expect from a moderately-skilled human who possesses at least two functioning braincells. That paragraph begins "I have recently argued for forty-five minutes with ChatGPT". Do notice that there are six sentences in the paragraph. I encourage you to read all of them (make sure to check out the footnote... it's pretty good).

The exact text of the ChatGPT session is irrelevant; even if you reported that you were unable to reproduce the issue, it would only reinforce one of the underlying points -namely- that these systems are unreliable. aphyr has a pretty extensive body of published work that indicates that he'd not likely fabricate a story of an LLM repeatedly failing to accomplish a task that any moderately-skilled human could accomplish when equipped with the proper tools. So, I believe that his report is true and accurate.

  • There's also this seven-week-old example [0] (linked in the essay) of ChatGPT very confidently recommending a asinine course of action because it was unable to understand what the hell it was being told.

    Listening to the audio is not required, as there's a reasonably accurate on-screen transcript, but it is valuable to listen to just how very hard they've worked to make this tool sound both confident and capable, even in situations where it's soul-crushingly incorrect. Those of us who have worked in Blasted Corporate Hellscapes may recognize how this manner of speaking can be very, very compelling to a certain sort of person (who -as it turns out- is frequently found in a management position).

    [0] <https://www.instagram.com/reel/DUylL79kvub/>

    • This is classic case of not using the proper version. Use the thinking version gpt5.4 (text) and tell me if it bullshits.

      Surely you must be able to find at least one example no?

      16 replies →