← Back to context

Comment by seanhunter

4 days ago

In general this is a cool article but I worry when TFA observes that cosine similarity and the dot product are the same under certain conditions (specifically that the vectors are unit vectors). It's really important that people who are using these measures understand something as basic as this. It shouldn't need to be said in an article but I feel (as the author obviously also does) that it does need to be said because people are just blindly using it without understanding it at all.

Cosine similarity literally comes from solving the geometric formula for the dot product of two Euclidian vectors to find cos theta, so of course it's the same. That is to say

a . b = |a||b| cos theta

Where a and b are the two vectors and theta is the angle between them. Therefore

cos theta = (a . b)/(|a||b|)

TADA! cosine similarity.[1]

If the vectors are unit vectors (he calls this "normalization" in the article) then |a||b| = 1 so of course cos theta = a . b in that case.

If you don't understand this, I really recommend you invest an afternoon in something like khan academy's "vectors" track from their precalculus syllabus. Understanding the underlying basic maths will really pay off in the long run.[2]

[1] If you have ever been confused by why it's called cosine similarity when the formula doesn't include a cosine that's why. The formula gives you cos theta. You would take the arccosine if you wanted to get theta but if you're just using it for similarity you may as well not bother to compute the angle and just use cos theta itself.

[2] Although ML people are just going to keep on misusing the word "tensor" to refer to a mere multidimensional array. I think that ship has sailed and I just need to give up on that now but there's still hope that people at least understand vectors when they work on this stuff. Here's an amazing explanation of what a tensor actually is for anyone who is interested https://www.youtube.com/watch?v=f5liqUk0ZTw

A few nitpicks:

First, I am not sure whom you refer to - as (I hope) everyone who uses cosine similarity has seen a . b = |a||b| cos theta. I read its very name, "cosine (of the angle between vectors) used as a similarity measure".

Second, cos theta = (a . b)/(|a||b|) is pretty much how you define the angle between vectors, when working in Hilbert spaces.

Third, you pick a very narrow view of tensor when it is based on spatial coordinates (and so you get covariant and contravariant indices). But even in physics, this notation is broader - e.g. in quantum physics, a state of two-qubit lives in the tensor product space of two single-qubit states. Sure, both in terms of states and operators, you have a notion of covariance and contravariance (bras and kets, respectively). In mathematics, it is even broader - all you need is two vector spaces and ⊗.

In terms of deep learning (at least in most cases), there is no less notion of co- and contravariance. Yet, the tensor product makes sense, as (say) we can have an outer product between the sample and channels. Quite a few operations could be understood that way, e.g., so-called 1x1 convolutions that mix channels but do not do anything spatially and channel-wise.

A few notes here:

https://github.com/stared/thinking-in-tensors-writing-in-pyt...

> Although ML people are just going to keep on misusing the word "tensor" to refer to a mere multidimensional array.

Could you elaborate on the difference?

I was under the impression that beyond the fact that arrays are a computer science concept and tensors are more of a math/physics concept, for all intents and purposes, they are isomorphic.

How is a tensor more than just a multidimensional array?

  • Definitely not an expert so I’m on a journey learning this stuff, but as I understand it at the moment, a multidimensional array can represent a tensor, but to be a tensor, a multidimensional array needs the specific additional property that it “transforms like a tensor” that is, that as you apply some transformation to its components, that its basis vectors transform in such a way as to preserve the “meaning” of the tensor. An example will make this clear. Say I am in manhattan and I have a vector (rank 1 tensor) which points from my current position to the top of the empire state building. I can take the components of this vector in cartesian (x,y,z) form and represent it that way as ai + bj + ck where i,j, and k are the Cartesian basis vectors. However I can use another representation if I want to. Like say I transform this vector so I’m using spherical coordinates, the basis vectors will transform using the inverse of whatever transformation I did on the xyz components so the new basis vectors multiplied by the new components will give me the exact same actual vector I had before (ie it will still point from me to the empire state).

    • Replying to myself to explain: - The components of the vector (in whatever coordinate system) are simply an array

      - The combination of components + basis vectors + operators that transform components and basis vectors in such a way as to preserve their relationship is a tensor

      In ML (and computer science more broadly), people often use the word tensor just to mean a multi-dimensional array. ML people do use tensor products etc so they maybe have more justification that some folks for using the word but I'm not 100% convinced. Not an expert as I say.