Comment by stared
4 days ago
A few nitpicks:
First, I am not sure whom you refer to - as (I hope) everyone who uses cosine similarity has seen a . b = |a||b| cos theta. I read its very name, "cosine (of the angle between vectors) used as a similarity measure".
Second, cos theta = (a . b)/(|a||b|) is pretty much how you define the angle between vectors, when working in Hilbert spaces.
Third, you pick a very narrow view of tensor when it is based on spatial coordinates (and so you get covariant and contravariant indices). But even in physics, this notation is broader - e.g. in quantum physics, a state of two-qubit lives in the tensor product space of two single-qubit states. Sure, both in terms of states and operators, you have a notion of covariance and contravariance (bras and kets, respectively). In mathematics, it is even broader - all you need is two vector spaces and ⊗.
In terms of deep learning (at least in most cases), there is no less notion of co- and contravariance. Yet, the tensor product makes sense, as (say) we can have an outer product between the sample and channels. Quite a few operations could be understood that way, e.g., so-called 1x1 convolutions that mix channels but do not do anything spatially and channel-wise.
A few notes here:
https://github.com/stared/thinking-in-tensors-writing-in-pyt...
That’s an excellent reference. Thank you.