Comment by pksebben
5 hours ago
guidance and alignment are usually handled by RLHF, which actually rewires the weights such that it becomes near-impossible for the model to have certain kinds of 'thoughts'. This is baked in such that it's not something you can just extract or turn off.
No comments yet
Contribute on Hacker News ↗