Comment by harrouet
3 hours ago
@hypendev I am not trying to start a flame war, but let me take a very simple example.
As another one put it, we know how to build deep-learning machines. No question about that. My statement is that we don't understand clearly why they output the observed results.
Let's imagine that you have a model that can detect cats on an image, with 95% accuracy. If you understood how the model worked, I could give you an image of a cat and you could _predict_ reliably if the model would detect the cat.
Yet, we are not able to do that: you have to give the image to the model to observe the result. We can't predict reliably (i.e. scientifically) the result and we don't know how to better train the model to detect the cat without altering the other results. (Of course including the test image in the training set is forbidden).
Back to LLM: we can't predict how they will behave. Therefore, even world-class scientists at OpenAI, knowing about a Goblin issue and making assumptions about the cause, are not able to edit the model directly to fix it. They would if they understood it fully. But they are reduced to test-and-hack their way through.
Sorry if it sounded like that, not trying to have a flame war, just trying to understand which part we don't _understand_, as it seems silly to me.
Yeah, we cannot predict with 100% accuracy the results of a model, not mentally, as to be able to do that we should be able to do the same math in our head and that's just ultra rare next level intelligence. And we can make a reliable predictor, but making a reliable prediction model of a models results would be the same model in the end.
So the closest that we can get to "understanding" it fully, is learning how it works, and developing intuition around it. And I think we pretty much have that, at least among the people in the field. Those who worked on training it especially have some intuitive understanding of what is going on, otherwise they would not know where to "test and hack".
It's math all the way down, but I feel like the angle some people in early days used about "magic emergent properties" or "signs of consciousness" ended up making it seem more mystical than it is.