Comment by measurablefunc

1 day ago

I think you're misunderstanding the joke.

Yes joke is:

    [A B]

times

    [1]
    [1]

is

    [A+B]

  • From context then, I infer that a transformer is not comprised of matrix multiplications, because it would simply be one that adds two 10-digit numbers.

    • A transformer tokenizes input, does a bunch of matmul and relu set up in a certain way. It doesn't get to see the raw number (just like you don't when you look at 1+1 you need visual cortex etc. first.)

      5 replies →