← Back to context

Comment by anon291

5 hours ago

The weights are uninteresting. People need to get out of their head that NNs are built on numbers. They're built on matrices, which are conveniently representable as numeric arrays, but are their own thing. Similarly, the rational numbers are their own thing and some are representable as 32-bit numbers via the IEEE754 encoding (or 16-bit numbers via a variety of encodings, etc).

Matrices are interesting because they can encode any algebraic group. They're also interesting because they can encode arbitrary linear transformations over a space. All of these things are interesting, and have nothing to do with numbers.

For any particular language model, you can always rotate the matrices and the embeddings and such and get a perfectly reasonable model out that behaves exactly the same.

This is because the training process produces a particular geometry, so transformations which preserve that geometry preserve the structure of the network. The geometry is interesting, the numbers are not.