Comment by YetAnotherNick
4 days ago
I wish there are benchmarks for these scenarios. Anyone who has used LLMs know that they are very different from human. And after certain context, it become irritating to talk to these LLMs.
I don't want my LLM to excel in IMO or codeforces. I want it to understand my significantly easier but complex to state problem, think of solutions, understand its own issues and resolve it, rather than be passive agressive.
"Benchmarks" in AI are hilarious. These tools can't even solve problems which are moderately more difficult than something that has a geeks4geeks page, but according to these benchmarks they are all IOI gold medallists. What gives?
The benchmarks are created by humans. So are the training sets. It turns out the sorts of problems that humans like to benchmark with are also the sorts of problems humans like to discuss wherever that training set was scraped.
Well that and the whole field is filled with AI hypemen who "contribute" by asking ChatGPT about the quality and validity of some other GPT response.
LLMs can't think. They are not rational actors, they can only generate plausible-looking texts.
Maybe so, but they boost my coding productivity, so why not?
(Not the mentioned LLMs here though.)
I do the rational acting, and it does the rest.
You're being reductive. A system should be evaluated on how its measurable properties more than anything else.
Being "reductive" is how we got where we are today. We try to form hypotheses about things so that we can reduce them to their simplest model. This understanding then leads to massive gains. We've been doing this ever since we have observed things like the behavior of animals in order that we could hunt them more easily.
In the same way it helps a lot to try to understand what the correct model of an AI is in order that we can use it more productively. Certainly based on it's 'measurable properties' it does not behave like a reasonable human being. Some of the time it does, some of the time it goes completely off the rails. So there must be some other model that is more useful. "They are not rational actors, they can only generate plausible-looking texts." - seems to be more useful to me. "They are rational actors" - would be more like magical thinking which is not what got us to where we are today.