← Back to context

Comment by fennecfoxy

2 months ago

Eh I feel like that mostly just down to; yes transformers are a "next token predictor" but during fine tuning for instruct the attention related wagon slapped on the back is partially hijacked as a bridge from input token->sequences of connections in the weights.

For example if I ask "If I have two foxes and I take away one, how many foxes do I have?" I reckon attention has been hijacked to essentially highlight the "if I have x and take away y then z" portion of the query to connect to a learned sequence from readily available training data (apparently the whole damn Internet) where there are plenty of examples of said math question trope, just using some other object type than foxes.

I think we could probably prove it by tracing the hyperdimensional space the model exists in and ask it variants of the same question/find hotspots in that space that would indicate it's using those same sequences (with attention branching off to ensure it replies with the correct object type that was referenced).