Comment by SwellJoe

3 hours ago

"What explains the emergent abilities of generative pre-trained transformers at massive-scale? Abilities that the smaller GTP’s don’t possess."

What "emergent" abilities do you mean? In my experience, smaller models behave exactly as I would expect a model with a lot fewer data and fewer connections between the data to behave. It is a difference of scale and not of kind when comparing Gemma 4 E2B (which runs on literally any modern computing device, including a CPU in a modest laptop or phone) to the current frontier models. Each step up adds more knowledge of how to do more things, and more working memory and tool capability to do more, but it does not look anything like a line being crossed into sentience, to me. They all still seem like machines. If you compare outputs across each step up in size and capability, which is something I've done, you'll see incremental improvements. You won't see a sudden spark where it's a different type of thing, it's just gradually getting more capable.

I think the memory features companies are sticking on these things is detrimental to mental health. It adds to the illusion that there's something else happening, other than some equations being calculated with some randomness thrown in. But, it's just the model querying the memory database (whatever form that takes) because it's been instructed to do so. The model doesn't want to know anything about who it's talking to. It's just following the system prompt. That doesn't make it your friend. Humans will see a face on the moon, that doesn't mean the moon will be my friend, either.