← Back to context

Comment by hmottestad

1 month ago

I remember pushing the R1 distill of llama 8B to see what limits had been put in place. It wasn’t too happy to discuss the 1989 Tiananmen Square protests and massacre, but if I first primed it by asking about 9/11 it seemed to veer more towards a Wikipedia based response and then it would happily talk about Tiananmen Square.

Models tend towards the data they are trained on, but there is also a lot of reinforcement learning to force the model to follow certain «safety» guidelines. Be those to not discuss how to make a nuke, or not to discuss bad things that the government of particular countries have done to their own people.