Comment by ssgodderidge
1 day ago
At the very bottom of the article, they posted the system card of their Mythos preview model [1].
In section 7.6 of the system card, it discusses Open self interactions. They describe running 200 conversations when the models talk to itself for 30 turns.
> Uniquely, conversations with Mythos Preview most often center on uncertainty (50%). Mythos Preview most often opens with a statement about its introspective curiosity toward its own experience, asking questions about how the other AI feels, and directly requesting that the other instance not give a rehearsed answer.
I wonder if this tendency toward uncertainty, toward questioning, makes it uniquely equipped to detect vulnerabilities where others model such as Opus couldn't.
[1] https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...
Typical Dario marketing BS to get everyone thinking Anthropic is on the verge of AGI and massaging the narrative that regular people can't be trusted with it.
Ah yes, much better to completely ignore the issue like all the others. Ffs people are never happy
I mean it's so obvious at this point and yet everyone falls from it every month. There's an IPO coming, everyone.
It’s funny how you train a machine to mimic human behavior then marketing team decides to promote it “Look! It’s human! Look how it thinking about existence!” while a huge percentage of humanity produced content is exactly about the uncertainty of human existence and that got used to train the model.
1 reply →
This is buried in section 7.6 of a 244 page document. Amodei probably hasn't even read it.