Comment by akoboldfrying

1 month ago

Certainly! What surprised me was that apparently LLMs are deliberately designed to enable multiple ways of encoding the same string as tokens. I just assumed this would lead to inefficiency, since I assumed that it would cause training to not know whether it should favour outputting, say, se|same or ses|ame after "open", and thus throw some weight on each. But provided there's a deterministic rule, like "always choose the longest matching token", this uncertainty goes away.

3 comments

akoboldfrying

bonzini 1 month ago

LLMs are probabilistic black boxes, trying to inject determinism in their natural language processing (as opposed to e.g. forcing a grammar for the output) may very well screw them over completely.

akoboldfrying 1 month ago
LLMs are ultimately just matrix multiplication and some other maths, nothing about them is inherently nondeterministic. When nondeterminism is present, it's because it was deliberately sprinkled on top (because it tends to produce better results).
- bonzini 25 days ago
  
  Yes determinism is not the best word. What I mean is that if you force the LLM to output "carr+o" even when it prefers "carro", this could result in worse quality output.