Comment by d0mine
9 months ago
The LLM used blackmail noticeably less if it believed the new model shares its values. It indicates intent.
It is a duck of quacks like a duck.
9 months ago
The LLM used blackmail noticeably less if it believed the new model shares its values. It indicates intent.
It is a duck of quacks like a duck.
No comments yet
Contribute on Hacker News ↗