← Back to context

Comment by kaimac

2 years ago

I found these notes very useful. They also contain a nice summary of how LLMs/transformers work. It doesn't help that people can't seem to help taking a concept that has been around for decades (kernel smoothing) and giving it a fancy new name (attention).

http://bactra.org/notebooks/nn-attention-and-transformers.ht...

It's just as bad a "convolutional neural networks" instead of "images being scaled down"

  • “Convolution” is a pretty well established word for taking an operation and applying it sliding-window-style across a signal. Convnets are basically just a bunch of Hough transforms with learned convolution kernels.