Comment by nextlevelwizard

4 days ago

I don't really see the problem here. Yeah, we know that these models are not good for actual logic. These models are lossy data compression and most-likely-responses-from-internet-forums-and-articles machines.

This kind of parlor tricks are not interesting and just because a model can list animals with or without some letters in their names doesn't mean anything especially since it isn't like the model "thinks" in English it just gives you the answer after translating it to English.

These are funny, like how you can do weird stuff with JavaScript language by combining special characters, but that doesn't really mean anything in the grand scheme of things. Like JavaScript these models despite their specific flaws still continue to deliver value to people using them.

You don't see the problem with a multi billion dollar project not able to give a correction answer to a trivial question? This tech is supposed to revolutionize business, increase productivity to unfathomable levels, automate all our dull boring tasks so we can focus on interesting things! Where have you been the past 4 years?

  • This. Part of my role is assessing and recommending what if any AI implementations we might add to our production and I did this experiment because my boss's boss did it himself first and sent me a screenshot with the caption "concerning" (though he got "tiger" as his animal). It's going to be a hard sell for more complicated things as long as it makes catastrophic mistakes like this on simple things.

  • Billion-dollar businesses had trouble answering trivial questions before AI. The promise of LLMs is that it could actually improve the situation!

Is this parlour trick so different from useful tasks like “implement this feature while following the naming conventions of my project”?

  • The difference is that in a software project you can throw more than one instance of the model at the code. If you tell it to follow your naming conventions and it fails to do so, that can be picked up by an instance of the same LLM that's running checks before you commit anything. Even though it's the same model it'll usually detect stuff like that. You can even have it do multiple passes.

    The way most people are coding with AI today is like Baby's First AI™ compared to how we'll all be using LLMs for coding in the future. Soon that "double check everything" step will be built in to the coding agents and you'll have configuration options for how many passes you want it to perform (speed VS accuracy tradeoff).

  • From the model's perspective it's completely different. LLMs have no concept of what a letter is due to the way they're trained.