Comment by theGnuMe

10 days ago

You can have small epsilons instead of zeros.

what, for all possible words?

  • Instead of a naive dense matrix, you can use some implementation that allows sparsity. If element does not exist, gets a non-zero value which can still be sampled. Which theoretically enables all outputs.

    • You're describing "temperature". That is usually done using the softmax function which cannot output zero for any element. In fact zero temperature is special cased, or they do exactly what you just said (add a teeny tiny epsilon to everything) in order to avoid having to treat zero temperature as a special case.