Comment by simianwords

10 hours ago

Hi, my position on the issue is that LLMs are powerful but may make mistakes in long context problems like coding (which the harness solves by feedback). But makes close to no (undergrad level) mistakes in questions that fit 2-3 pages. For you personally: do you believe me on this specific part on 2-3 pages?

I don't know what aphyr did and tbh his whole screed on LLMs make me feel he didn't use it properly or at least coming from a bad faith angle.

That's why I'm asking you (and others). Please come up with a text prompt spanning < 4 pages and lets see if it bullshits.

Surely the implication of such a screed is that it should be super simple to find at least one example of it clearly bullshitting in my constraint, no? Or am I interpreting the post in a bad faith way?

15 comments

simianwords

simoncion 10 hours ago

Neat.

So, despite the fact that it looks like you have to pay for ChatGPT Voice mode with video, [0] it doesn't count as an

  example of it bullshitting on ChatGPT (paid version)

That is, father_phi's use of what seems to be a paid version of ChatGPT to have a bullshit-filled conversation that definitely spans less than four pages doesn't count?

[0] The page at [1] declares that the video feature is "Available in ChatGPT Plus, Pro, Business, Enterprise, and Edu on mobile"

[1] <https://chatgpt.com/features/voice-with-video/>

simianwords 9 hours ago
Lets stick to my challenge please - thinking version, find bullshit. If you can't, thats ok. Do you accept then under the constraints that the thinking version doesn't produce bullshit?
- simoncion 9 hours ago
  
  Given aphyr's vocation (and how very lucrative it is), and how years and years of his writing indicates that he's very devoted to getting a correct and complete answer when investigating a question, I find it hard to believe that he's not using a paid version of the LLMs. If I knew him, I'd ask and verify, but I don't, so I won't.
  > Lets stick to my challenge please...
  I did. Your challenge was literally:
  If it bullshits so much, you wouldn't have a problem giving me an example of it bullshitting on ChatGPT (paid version)? Lets take any example of a text prompt fitting a few pages - it may be a question in science or math or any domain. Can you get it to bullshit?
  father_phi's two-sentence question about the whether one can use a cup that's closed at the top and open at the bottom definitely counts. Given what I've mentioned about apyhr above, I expect he has already run your challenge on the fanciest-available version and reported on the results in the essay under discussion.
  
  11 replies →