Comment by wisty
2 days ago
I'm not sure what you mean by "deals in facts, not words" means.
Llm deal in vectors internally, not words. They explode the word into a multidimensional representation, and collapse it again, and apply the attention thingy to link these vectors together. It's not just a simple n:n Markov chain, a lot is happening under the hood.
And are you saying the syncophant behaviour was deliberately programmed, or emerged because it did well in training?
If you're not sure, maybe you should look up the term "expert system"?
It was a polite way of saying "that's kinda bull".
And yes, I know what an expert system is.
Do you know that a neural network (or set of matrices, same thing really) can approximate anything else? https://en.wikipedia.org/wiki/Universal_approximation_theore...
How do you know that inside the black box, they don't approximate expert systems?
I'm not sure you do, because expert systems are constraint solvers and LLMs are not. They literally deal in encoded facts, which is what the original comment was about.
The universal approximation theorem is not relevant. You would first have to try to train the neural network to approximate a constraint solver (that's not the case with LLMs), and in practice, these kinds of systems are exactly the ones that a neural network is bad at.
The universal approximation theory says nothing about feasibility, it only talks about theoretical existence as a mathematical object, not whether the object can actually be created in the real world.
I'll remind you that the expert system would have to have been created and updated by humans. It would have had to have been created before a neural network was applied to it in the first place.
LLMs are not like an expert system representing facts as some sort of ontological graph. What's happening under the hood is just whatever (and no more) was needed to minimize errors on it's word-based training loss.
I assume the sycophantic behavior is part because it "did well" during RLHF (human preference) training, and part deliberately encouraged (by training and/or prompting) as someone's judgement call of the way to best make the user happy and own up to being wrong ("You're absolutely right!").
It needs something mathematically equivalent (or approximately the same), under the hood, to guess the next word effectively.
We are just meat eating bags of meat, but to do our job better we needed to evolve intelligence. A word guessing bag of words also needs to evolve intelligence and a world model (albeit an impicit hidden one) to do its job well, and is optimised towards this.
And yes, it also gets fine trained. And either its world model is corrupted by our mistakes (both in trining and fine tuning), or even more disturbingly it simplicity might (in theory) figue out one day (in training, impicitly - and yes it doesn't really think the way we do) something like "huh, the universe is actually easier to predict if it is modelled as alphabet spaghetti, not quantum waves, but my training function says not to mention this".