Comment by CuriouslyC

1 year ago

At some point we're going to go from tokens to embeddings for everything. I saw some research on variable length embeddings, I wouldn't be surprised if someone generated a huge embedding space, did some form of PCA on generated embeddings, threw away low eigenvalue vectors, then trained a distilled model that generated variable length embeddings directly from that.

1 comment

CuriouslyC

cs702 1 year ago

> At some point we're going to go from tokens to embeddings for everything.

Yes, I agree.

Further down the road, I imagine we will end up finding interesting connections to the symbolic approaches of GOFAI, given that the embedding of a token, object, concept, or other entity in some vector space is basically a kind of symbol that represents that token, object, concept, or entity in that vector space.

Interestingly, old terms like "representation" and "capsule," which didn't become as widely adopted as "embedding," tried more explicitly to convey this idea of using vectors/matrices of feature activations to stand in for objects, concepts, and other entities.

For example, see Figure 1 in this paper from 2009-2012: http://www.cs.princeton.edu/courses/archive/spring13/cos598C... -- it's basically what we're talking about!