Comment by spopejoy

2 months ago

Sorry to RFELI5 but but ... I thought a "token" was a word? The example is of names and the output is new improvised names, implying that a character is a token? Or do all LLMs operate at character level?

Also is there some minima of training data? E.g. if you just trained on "True" "False" I assume it would be .5 Bernoulli? What is the minimum to see "interesting" results I guess.