← Back to context

Comment by X-Ryl669

11 hours ago

I'm not 100% sure it's not possible. If (I don't know) it's possible to freeze the temperature of the model so it's deterministic, and if you could make a map of produced words back to tokens (via HMM probably), then you can probably alter a minimal input and observe the output to model it. If you perform waves of such minimal alterations, you can expect to be able to locate the distance where each alteration impact the model (the idea being that a small alteration on output is likely due to the last layers of the models, and a small alteration is likely due to the deeper layer). Once you've located most of the last layer(s?) weights, you can try to solve for them. With a hundreds of billions weights model, the last layers will likely be so huge that it's probably unfeasible technically, but it's theoretically possible.