Comment by kazinator
13 days ago
You don't get to pick what is "correct"; that's dictated by the training data. (Which cuold contain multiple versions that have either choice).
13 days ago
You don't get to pick what is "correct"; that's dictated by the training data. (Which cuold contain multiple versions that have either choice).
Neither do you, yet you literally just brashly and overconfidently hallucinated exactly like a poorly trained token predicting LLM that the:
>Correct prediction is "dogs".
Maybe I just trained on Wikipedia, or the February 9, 1885 edition of The Boston Journal "Current Notes" article, or Linda Bronson's 1888 book "Illustrative Shorthand", for what the statistically and historically "correct" completion is.
https://news.ycombinator.com/item?id=15886728
As ninetyninenine pointed out, we don't understand what's going on with you either:
>Someone can claim that the responses are an illusion and they have no meaning but that exact argument applies to humans and we can’t say anything either way. We don’t understand what’s going on.
What I meant was that the correct prediction is "dogs" under the assumption that that is what has occurred in the arbitrarily selected training data. While we are training the thing, we want to get it to predict whatever is in the data.
> Why would you insert redundant letters into a lovely well-known succinct pangram
The 's' is not redundant, because my version of the pangram has "fox jumped" not "fox jumps".
I picked up this version as a kid and have always used that one when testing new keyboards.
> the whole point is to make the SHORTEST sentence with all letters of the alphabet
If that is the whole point, the fox-dog sentence must be abandoned for a shorter pangram.
The requirement those shorter pangrams tend not to satisfy is being easy to remember.
People don't care about the sentence being short, which is why variants starting with "The ..." instead of "A ..." soon appeared after the invention of the original, which itself could have dropped the "A".
>under the assumption that that is what has occurred in the arbitrarily selected training data
Under that assumption, the correct prediction is "dog", not "dogs", because LLMs are trained on the wikipedia page https://en.wikipedia.org/wiki/The_quick_brown_fox_jumps_over... [..._the_lazy_dog] and other earlier publications like The Boston Journal "Current Notes" article, and Linda Bronson's 1888 book "Illustrative Shorthand", and not the version you picked up as a kid.
Can you cite any sources that says "dogs" that are more popular than the wikipedia page or better known than the 19th century publications it cites?
I will ask several popular LLMs what they think, even using four underscores to imply a four letter word:
How much more proof do you need? Do you think all 5 LLM and I are hallucinating? Can you show me any LLM that actually predicts "dogs"?
If you can't cite any wikipedia pages, books, articles, or other examples of an LLM that predicts "dogs", do you still stand by your bold claim that:
>Correct prediction is "dogs".
Your mistake was confidently but incorrectly using the word "correct". In the context of predicting the most likely word, the correct, highest probability answer by far is unequivocally "dog", not "dogs".
That's why the wikipedia page, which all mainstream LLMs are trained on, has the title that it does.
The other hypocritical mistake you made was saying the following right after YOU incorrectly tried to pick what is "correct":
>You don't get to pick what is "correct"
So am I to understand that even though I DON'T get to pick what is correct (which I'm not presuming to do: I'm just deferring to wikipedia, two 19th century publications, and 5 LLMs), but YOU DO get to pick what is "correct", even if it contradicts all available evidence? Because that's exactly what you're doing! Please explain how you got such an awesomely important, exclusive, earth shattering superpower? Were you bitten by a radioactive web crawler?
If you boldly claim to be the only person who has the actual power to pick what's correct and what's not, then there are so many other much more important wrongs you should be righting than arguing about "dog" vs "dogs" on hacker news.
For more evidence, you can even read the talk page and archive yourself. There are discussions about alternatives, but NONE of the alternatives ever use the word "dogs".
Talk: https://en.wikipedia.org/wiki/Talk:The_quick_brown_fox_jumps...
Talk Archive: https://en.wikipedia.org/wiki/Talk:The_quick_brown_fox_jumps...
The larger point is that your argument is reductionist, and you're exhibiting the same kind of mistakes that people point to when they criticize LLMs for their overconfident hallucinations.