Comment by pksebben

2 months ago

I'm wondering exactly how they expect the DPA to help them with what is essentially a SaaS product. It's still going to refuse to do things it refuses to do.

2 comments

pksebben

lurkshark 2 months ago

My thought was that if the refusal to service some requests is implemented as an external guard model The Pentagon could try to require them to drop the guard model. This would be similar to saying "we're asking for a 'product' you already 'manufacture'" in the way the DPA is often understood. But if the refusal is baked into the model itself then that argument is dead. Not saying I agree with this, I think it turns into the same kind of problem we saw with the Apple v. FBI conflict and the All Writs Act, but the government doesn't always act in the most sane ways.

pksebben 2 months ago

guidance and alignment are usually handled by RLHF, which actually rewires the weights such that it becomes near-impossible for the model to have certain kinds of 'thoughts'. This is baked in such that it's not something you can just extract or turn off.