Comment by d0mine
2 months ago
The LLM used blackmail noticeably less if it believed the new model shares its values. It indicates intent.
It is a duck of quacks like a duck.
2 months ago
The LLM used blackmail noticeably less if it believed the new model shares its values. It indicates intent.
It is a duck of quacks like a duck.
No comments yet
Contribute on Hacker News ↗