← Back to context

Comment by dijksterhuis

19 hours ago

> It's not an error though

!define error

> 5. Mathematics The difference between a computed or measured value and a true or theoretically correct value.

^ this is the definition that applies. There is a ground truth (the output the user expects to receive) and model output. The difference between model output and ground truth ==> error.

--

> From is training it's outputting things most likely to come next

Just because a model has gone through training, does not mean the model won't produce erroneous/undesirable/incorrect test-time outputs.

--

> Saying it's an error means that being accurate is a feature and a bug that can be fixed.

Machine learning doesn't revolve around boolean "bug" / "not bug". It is a different ballgame. The types of test-time errors are sometimes just as important as the quantity of errors. Two of the simpler metrics for test-time evaluation of natural language models (note: not specifically LLMs) are WER (Word Error Rate) and CER (Character Error Rate). A model with a 3% CER isn't particularly helpful when the WER is 89%. There are still "errors". They're just not something that can be fixed like normal software "errors".

It is generally accepted some errors will occur in the world of machine learning.

- edit to add first response and formatting

I don't agree that that's the right definition to use though. LLMs do not output computed or measured values.

If I expect Windows to add $5 to my bank account every time I click the Start button, that's not an error with Windows, it's a problem with my expectations. It's not a thing that's actually made to do that. The start button does what it's supposed to (perhaps a bad example, because the windows 11 start menu is rubbish), not my imagined desired behavior.