Instead of a naive dense matrix, you can use some implementation that allows sparsity. If element does not exist, gets a non-zero value which can still be sampled. Which theoretically enables all outputs.
You're describing "temperature". That is usually done using the softmax function which cannot output zero for any element. In fact zero temperature is special cased, or they do exactly what you just said (add a teeny tiny epsilon to everything) in order to avoid having to treat zero temperature as a special case.
what, for all possible words?
Instead of a naive dense matrix, you can use some implementation that allows sparsity. If element does not exist, gets a non-zero value which can still be sampled. Which theoretically enables all outputs.
You're describing "temperature". That is usually done using the softmax function which cannot output zero for any element. In fact zero temperature is special cased, or they do exactly what you just said (add a teeny tiny epsilon to everything) in order to avoid having to treat zero temperature as a special case.
i think at that point it's definitionally not a markov chain anymore. how do you sample an open set of unknown values?
1 reply →