Comment by cplat

11 days ago

I don't understand. Deterministic and stochastic have very specific meanings. The statement: "To continue my reply I could say this word, more than the others, or maybe that one, a bit less, ..." sounds very much like a probability distribution.

If you really want to think at it as a probability, think at it as "the probability to express correctly the sentence/idea that was modeled in the activations of the model for that token". Which is totally different than "the probability that this sentence continues in a given way", as the latter is like "how in general this sentence continues", but instead the model picks tokens based on what it is modeling in the latent space.

  • That's not quite how auto-regressive models are trained (the expression of "ideas" bit). There is no notion of "ideas." Words are not defined like we humans do, they're only related.

    And on the latent space bit, it's also true for classical models, and the basic idea behind any pattern recognition or dimensionality reduction. That doesn't mean it's necessarily "getting the right idea."

    Again, I don't want to "think of it as a probability." I'm saying what you're describing is a probability distribution. Do you have a citation for "probability to express correctly the sentence/idea" bit? Because just having a latent space is no implication of representing an idea.