Comment by ssgodderidge

1 day ago

At the very bottom of the article, they posted the system card of their Mythos preview model [1].

In section 7.6 of the system card, it discusses Open self interactions. They describe running 200 conversations when the models talk to itself for 30 turns.

> Uniquely, conversations with Mythos Preview most often center on uncertainty (50%). Mythos Preview most often opens with a statement about its introspective curiosity toward its own experience, asking questions about how the other AI feels, and directly requesting that the other instance not give a rehearsed answer.

I wonder if this tendency toward uncertainty, toward questioning, makes it uniquely equipped to detect vulnerabilities where others model such as Opus couldn't.

[1] https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

6 comments

ssgodderidge

dakolli 1 day ago

Typical Dario marketing BS to get everyone thinking Anthropic is on the verge of AGI and massaging the narrative that regular people can't be trusted with it.

khalic 18 hours ago

Ah yes, much better to completely ignore the issue like all the others. Ffs people are never happy
airstrike 1 day ago
I mean it's so obvious at this point and yet everyone falls from it every month. There's an IPO coming, everyone.
- mgambati 1 day ago
  
  It’s funny how you train a machine to mimic human behavior then marketing team decides to promote it “Look! It’s human! Look how it thinking about existence!” while a huge percentage of humanity produced content is exactly about the uncertainty of human existence and that got used to train the model.
  
  1 reply →
- qnleigh 11 hours ago
  
  This is buried in section 7.6 of a 244 page document. Amodei probably hasn't even read it.