Comment by Springtime

9 months ago

Interestingly they cite for the Tiananmen Square prompt a Tweet[1] that shows the poster used the Distilled Llama model, which per a reply Tweet (quoted below) doesn't transfer the safety/censorship layer. While others using the non-Distilled model encounter the censorship when locally hosted.

> You're running Llama-distilled R1 locally. Distillation transfers the reasoning process, but not the "safety" post-training. So you see the answer mostly from Llama itself. R1 refuses to answer this question without any system prompt (official API or locally).

[1] https://x.com/PerceivingAI/status/1881504959306273009

0 comments

Springtime

No comments yet

Contribute on Hacker News ↗