← Back to context

Comment by raincole

3 days ago

> “Bag of words” is a also a useful heuristic for predicting where an AI will do well and where it will fail. “Give me a list of the ten worst transportation disasters in North America” is an easy task for a bag of words, because disasters are well-documented. On the other hand, “Who reassigned the species Brachiosaurus brancai to its own genus, and when?” is a hard task for a bag of words, because the bag just doesn’t contain that many words on the topic

It is... such a retrospective narrative. It's so obvious that the author learned about this example first than came with the reasoning later, just to fit in his view of LLM.

Imaging if ChatGPT answered this question correctly. Would that change the author's view? Of course not! They'll just say:

> “Bag of words” is a also a useful heuristic for predicting where an AI will do well and where it will fail. Who reassigned the species Brachiosaurus brancai to its own genus, and when?” is an easy task for a bag of words, because the information has appeared in the words it memorizes.

I highly doubt this author has predicted that "bag of Words" can do image editing before OpenAI released that.

I tested this with ChatGPT-5.1 and Gemini 3.0. Both correctly (according to Wikipedia at least) stated that George Olshevsky assigned it to its own genus in 1991.

This is because there are many words about how to do web searches.

  • Gemini 3.0 might do well even without web searches. The lesson from gpt 4.5 and Gemini 3 seems to be that scaling model size (even if you use sparse MoE) allows you to capture more long-tail knowledge. Some of Humanity's Last Exam also seems to be explicitly designed to test this long-tail obscure knowledge extraction, and models have been steadily chipping away at it.

When sensitivity analysis of ordinary least-squares regression became a thing it was also a "retrospective narrative". That seems reasonable for detecting fundamental issues with statistical models of the world. This point generalizes even if the concrete example falls down.

  • Does it generalize though? What a bag-of-words metaphor can say about a question "How many reinforcement learning training examples an LLM need to significantly improve performance on mathematical questions?"

Your conclusion seems super unfair to the offer, particularly your assumption, without reason as far as I can tell, that the author would obstinately continue to advocate for their conclusion in the face of new, contrary evidence.

  • I literally pasted the sentence as a prompt to the free version of ChatGPT "Who reassigned the species Brachiosaurus brancai to its own genus, and when?"

    and got ths correct reply from the "Bag of Words"

    The species Brachiosaurus brancai was reassigned to its own genus by Michael P. Taylor in 2009 — he transferred it to the new genus Giraffatitan. BioOne +2 Mike Taylor +2

    How that happened:

    Earlier, in 1988, Gregory S. Paul had proposed putting B. brancai into a subgenus as Brachiosaurus (Giraffatitan) brancai, based on anatomical differences. Fossil Wiki +1

    Then in 1991, George Olshevsky used the name Giraffatitan brancai — but his usage was in a self-published list and not widely adopted. Wikipedia +1

    Finally, in 2009 Taylor published a detailed re-evaluation showing at least 26 osteological differences between the African material (brancai) and the North American type species Brachiosaurus altithorax — justifying full generic separation. BioOne +1

    If you like — I can show a short timeline of all taxonomic changes of B. brancai.

    --

    As an author, you should write things that are tested or at least true. But they did a pretty bad job of testing this and are making assumptions that are not true. Then they're basing their argument/reasoning (restrospectively) on assumptions not gounded in reality.

I could not tell you who reassigned the species Brachiosaurus brancai to its own genus, and when, because of all the words I've ever heard, the combination of words that contains the information has not appeared.

GIGO has an obvious Nothing-In-Nothing-Out trivial case.

Isn't it pretty clear just from the first paragraph that the author has graphomania? Such people don't really care about the thesis, they care about the topic and how many literary devices they can fit into the article.

  • I don't know enough about graphomania, but I do find this article, while I'm sure is written by a human, has qualities akin to LLM writing: lengthy, forced comparisons and analogies. Of course it's far less organized than typical ChatGPT output though.

    The more human works I've read the more I feel meat intelligences are not that different from tensor intelligences.

    • I didn't claim or think it was written with a help of LLM, it was just written by someone who enjoys the feeling of being a writer, or even better, a Journalist!

      This always contrasts with articles written by tech people and for tech people. They usually try to convey some information and maybe give some arguments for their position on some topic, but they are always concise and don't wallow in literary devices.