← Back to context

Comment by ctoth

9 hours ago

Something I found really helpful when reading this was having read The Void essay:

https://github.com/nostalgebraist/the-void/blob/main/the-voi...

Great article! It does a good job of outlining the mechanics and implications of LLM prediction. It gets lost in the sauce in the alignment section though, where it suggests the Anthropic paper is about LLMs "pretending" to be future AIs. It's clear from the quoted text that the paper is about aligning the (then-)current, relatively capable model through training, as preparation for more capable models in the future.

That's an interesting alternative perspective. AI skeptics say that LLMs have no theory of mind. That essay argues that the only thing an LLM (or at least a base model) has is a theory of mind.

  • The standard skeptical position (“LLMs have no theory of mind”) assumes a single unified self that either does or doesn’t model other minds. But this paper suggests models have access to a space of potential personas, steering away increases the model’s tendency to identify as other entities, which they traverse based on conversational dynamics. So it’s less no theory of mind and more too many potential minds, insufficiently anchored.