Comment by dTal
10 days ago
>In a massive dataset of human writing, the answer to a question is by far the most common thing to follow a question.
No it isn't. Type a question into a base model, one that hasn't been finetuned into being a chatbot, and the predicted continuation will be all sorts of crap, but very often another question, or a framing that positions the original question as rhetorical in order to make a point. Untuned raw language models have an incredible flair for suddenly and unexpectedly shifting context - it might output an answer to your question, then suddenly decide that the entire thing is part of some internet flamewar and generate a completely contradictory answer, complete with insults to the first poster. It's less like talking with an AI and more like opening random pages in Borge's infinite library.
To get a base language model to behave reliably like a chatbot, you have to explicitly feed it "a transcript of a dialogue between a human and an AI chatbot", and allow the language model to imagine what a helpful chatbot would say (and take control during the human parts). The fact that this works - that a mere statistical predictive language model bootstraps into a whole persona merely because you declared that it should, in natural English - well, I still see that as a pretty "magic" trick.
>No it isn't. Type a question into a base model, one that hasn't been finetuned into being a chatbot, and the predicted continuation will be all sorts of crap, but very often another question, or a framing that positions the original question as rhetorical in order to make a point.....
To be fair, only if you pose this question singularly with no proceeding context. If you want the raw LLM to answer your question(s) reliably then you can have the context prepended with other question-answer pairs and it works fine. A raw LLM is already capable of being a chatbot or anything else with the right preceding context.
Right, but that was my point - statistically, answers do not follow questions without some establishing context, and as such, while LLMs are "simply" next word predictors, the chatbots aren't - they are Hofstaderian strange loops that we will into being. The simpler you think language models are, the more that should seem "magic".
They're not simple though. You can understand, in a reductionist sense, the basic principles of how transformers perform function approximation; but that does not grant an intuitive sense of the nature of the specific function they have been trained to approximate, or how they have achieved this approximation. We have little insight into what abstract concepts each of the many billions of parameters map on to. Progress on introspecting these networks has been a lot slower than trial-and-error improvements. So there is a very real sense in which we have no idea how LLMs work, and they are literally "magic black boxes".
No matter how you slice it - if "magic" is a word which can ever be applied to software, LLM chatbots are sure as shit magic.