← Back to context

Comment by jeremyjh

6 hours ago

We understand the low level details of how they are constructed. But we do not fully understand how higher-level behavior emerges - it is a subject of active research.

For example:

https://arxiv.org/html/2210.13382v5

https://arxiv.org/abs/2109.06129

We do understand tho, it is exactly what they were made for.

If you train it on a dataset of Othello games, or a dataset including these, you are basically creating a map of all possible moves and states that have ever happened, odds of transitions between them, effective and un-effective transitions.

By querying it, you basically start navigating the map from a spot, and it just follows the semi-randomly sampled highest confidence weights when navigating "the map".

And in the multidimensional cross-section of all these states and transitions, existence of a "board map" is implied, as it is a set of common weights shared between all of them. And it becomes even more obvious with championship models in Othello paper, as it was trained on better games in which the wider state of the board was more important than the local one, thus the overall board state mattered more for responses.

The second research you linked is also has a pretty obvious conclusion. It's telling us more about us as humans than about LLM's, about our culture and colors and how we communicate it's perception through text. If you want to try something similar, try kiki bouba style experiments on old diffusion models or old LLM's. A Dzzkwok grWzzz, will get you a much rougher and darker looking things than Olulola Opolili's cloudy vibes.

The active research is as much as:

- probing and seeing "hey lets see if funky machine also does X"

- finding a way to scientifically verify and explain LLMs behaviors we know

- pure BS in some cases

- academics learning about LLM's

And not a proof of where our understanding/frontier is. It is basically standardizing and exploring the intuition that people who actively work with models already have. It's like saying we don't understand math, because people outside the math circles still do not know all behaviors and possibilities of a monoid.

  • @hypendev I am not trying to start a flame war, but let me take a very simple example.

    As another one put it, we know how to build deep-learning machines. No question about that. My statement is that we don't understand clearly why they output the observed results.

    Let's imagine that you have a model that can detect cats on an image, with 95% accuracy. If you understood how the model worked, I could give you an image of a cat and you could _predict_ reliably if the model would detect the cat.

    Yet, we are not able to do that: you have to give the image to the model to observe the result. We can't predict reliably (i.e. scientifically) the result and we don't know how to better train the model to detect the cat without altering the other results. (Of course including the test image in the training set is forbidden).

    Back to LLM: we can't predict how they will behave. Therefore, even world-class scientists at OpenAI, knowing about a Goblin issue and making assumptions about the cause, are not able to edit the model directly to fix it. They would if they understood it fully. But they are reduced to test-and-hack their way through.

    • Sorry if it sounded like that, not trying to have a flame war, just trying to understand which part we don't _understand_, as it seems silly to me.

      Yeah, we cannot predict with 100% accuracy the results of a model, not mentally, as to be able to do that we should be able to do the same math in our head and that's just ultra rare next level intelligence. And we can make a reliable predictor, but making a reliable prediction model of a models results would be the same model in the end.

      So the closest that we can get to "understanding" it fully, is learning how it works, and developing intuition around it. And I think we pretty much have that, at least among the people in the field. Those who worked on training it especially have some intuitive understanding of what is going on, otherwise they would not know where to "test and hack".

      It's math all the way down, but I feel like the angle some people in early days used about "magic emergent properties" or "signs of consciousness" ended up making it seem more mystical than it is.