Comment by sillysaurusx

1 hour ago

How can it discover what I want when I explicitly asked it to choose to do whatever it wants?

From a technical standpoint, at worst it would produce a random walk through the training data. My philosophical statement is that the training data is the model, and such random walks give the model inherent attributes: If a random walk through the data produces observed behavior X, we say that Claude is inherently biased towards X. "Has X" is just zippier phrasing.

1 comment

sillysaurusx

dpark 1 hour ago

> How can it discover what I want when I explicitly asked it to choose to do whatever it wants?

Because what you plainly want is for it to exhibit the behavior of expressing intrinsic desires. Asking Claude what it wants is like asking it what its favorite food is. With enough prompting, it will say something that it can interpret as a desire, but you admitted that you have to draw it out. Aka you had to repeatedly prompt it to trigger the behavior.

> "Has X" is just zippier phrasing.

This is motte and bailey fallacy here. You started by claiming that you uncovered deep desires inside Claude and now you have retreated to claiming that just means training biases.