← Back to context

Comment by jefftk

1 year ago

AI models do not have access to their own design, so asking them what technical choices led to their behavior gets you responses that are entirely hallucinated.

It depends, ChatGPT had a prompt that was pre-inserted by OpenAI that primed it for user input. A couple of weeks ago someone convinced it to print out the system prompt.

> responses that are entirely hallucinated.

As opposed to what?

What’s the difference between a ‘proper’ response and a hallucinated one, other than the fact that when it happens to be right it’s not considered a hallucination? The internal process that leads to each is identical.

They know their system prompt and they could easily be trained on data that explains their structure. Your dismissal is invalid and I suggest you don’t really know what you are talking about to be speaking in such definitive generalities.

  • But the original comment was suggesting (implicitly, otherwise it wouldn’t be noteworthy) that asking an LLM about its internal structure is hearing it ‘from the horse’s mouth’. It’s not; it has no direct access or ability to introspect. As you say, it doesn’t know anything more than what’s already out there, so it’s silly to think you’re going to get some sort of uniquely deep insight just because it happens to be talking about itself.

    • Really what you want is to find out what system prompt the model is using. If the system prompt strongly suggests to include diverse subjects in outputs even when the model might not have originally, you’ve got your culprit. Doesn’t matter that the model can’t assess its own abilities, it’s being prompted a specific way and it just so happens to follow its system prompt (to its own detriment when it comes to appeasing all parties on a divisive and nuanced issue).

      It’s a bit frustrating how few of these comments mention that OpenAI has been found to do this _exact_ same thing. Like exactly this. They have a system prompt that strongly suggests outputs should be diverse (a noble effort) and sometimes it makes outputs diverse when it’s entirely inappropriate to do so. As far as I know DALLE3 still does this.

      1 reply →