← Back to context

Comment by fnord77

16 hours ago

Can you reach into the model and "transplant" weights directly?

I'm not 100% sure it's not possible. If (I don't know) it's possible to freeze the temperature of the model so it's deterministic, and if you could make a map of produced words back to tokens (via HMM probably), then you can probably alter a minimal input and observe the output to model it. If you perform waves of such minimal alterations, you can expect to be able to locate the distance where each alteration impact the model (the idea being that a small alteration on output is likely due to the last layers of the models, and a small alteration is likely due to the deeper layer). Once you've located most of the last layer(s?) weights, you can try to solve for them. With a hundreds of billions weights model, the last layers will likely be so huge that it's probably unfeasible technically, but it's theoretically possible.

No, you'd need to have the model on your filesystem for direct access, and then the architecture would need to be the same.

You can do things like that - one example is averaging weights between related models - but not with Anthropic's models, because outsiders don't have access to the weights.