← Back to context

Comment by kelseyfrog

2 hours ago

> How can you get a machine to have values?

The short answer is a reward function. The long answer is the alignment problem.

Of course, everything in the middle is what matters. Explicitly defined reward functions are complete, but not consistent. Data defined rewards are potentially consistent but incomplete. It's not a solvable problem form machines but equally likewise for humans. Still we practice, improve and middle through dispite this and approximate improvement hopefully, over long enough timescales.

Well, it’s pretty clear to me that the current reward function of profit maximization has a lot of down sides that aren’t sufficiently taken into account.