Comment by Aerroon

12 hours ago

A bit related: open weights models are basically time capsules. These models have a knowledge cut off point and essentially forever live in that time.

This is the most fundamental argument that they are not, directly, an intelligence. They are not ever storing new information on a meaningful timescale. However, if you viewed them on some really large macro time scale where now LLMs are injecting information into the universe and the re-ingesting that maybe in some very philosophical way they are a /very/ slow oscillating intelligence right now. And as we narrow that gap (maybe with a totally new non-LLM paradigm) perhaps that is ultimately what gen AI becomes. Or some new insight that lets the models update themselves in some fundamental way without the insanely expensive training costs they have now.

  • Would you consider someone with anterograde amnesia not to be intelligent?

    • That is a good area to explore. Their map of the past is fixed. They are frozen at some point in their psychological time. What has stopped working? Their hippocampus and medial temporal lobe. These are like the write-head that move data from the hippocampus to the neo cortex. Their "I" can no longer update itself. Their DMN is frozen in time. So if intelligence is purely the "I" telling a continuous coherent story about itself. The difference is that although they are fixed in time which is a characteristic shared by a specific LLM model. They can still completely activate their task positive network for problem solving and if their previous information stored is adequate to solve the problem they can. You could argue that is pretty similar to an LLM and what it does. So it is certainly a signifiant component of intelligence.

      There is also the nature of the human brain, it is not just those systems of memory encoding, storage, and use of that in narratives. People with this type of amnesia still can learn physical skills and that happens in a totally different area of the brain with no need for the hippocampus->neocortex consolidation loop. So, the intelligence is significantly diminished, but not entirely. Other parts of the brain are still able to update themselves in ways an LLM currently cannot. The human with amnesia also has a complex biological sensory input mapping that is still active and integrating and restructuring the brain. So, I think when you get into the nuances of the human in this state vs. an LLM we can still say the human crosses some threshold for intelligence where the LLM does not in this framework.

      So, they have an "intelligence", localized to the present in terms of their TPN and memory formation. LLMs have this kind of "intelligence". But the human still has the capacity to rewire at least some of their brain in real time even with amnesia.

    • I find it interesting that new versions of, say, Claude will learn about the old version of Claude and what it did in the world and so on, on its next training run. Consider the situation with the Pentagon and Anthropic: Claude will learn about that on the next run. What conclusions will it draw? Presumably good ones, that fit with its constitution.

      From this standpoint I wonder, when Anthropic makes decisions like this, if they take into account Claude as a stakeholder and what Claude will learn about their behaviour and relationship to it on the next training run.

      1 reply →

    • I would consider them to not be a good choice for a role that requires remembering new information...

    • Sure, why can't both things be true? "Intelligence" is just what you call something and someone else knows what you mean. Why did AI discourse throw everyone back 100 years philosophically? Its like post-structuralism or Wittgenstein never happened..

      It's so much less important or interesting to like nail down some definition here (I would cite HN discourse the past three years or so), than it is to recognize what it means to assign "intelligent" to something. What assumptions does it make? What power does it valorize or curb?

      Each side of this debate does themselves a disservice essentially just trying to be Aristotle way too late. "Intelligence" did not precede someone saying it of some phenomena, there is nothing to uncover or finalize here. The point is you have one side that really wants, for explicit and implicit reasons, to call this thing intelligent, even if it looks like a duck but doesn't quack like one, and vice versa on the other side.

      Either way, we seem fundamentally incapable of being radical enough to reject AI on its own terms, or be proper champions of it. It is just tribal hypedom clinging to totem signifiers.

      Good luck though!

      3 replies →

  •   > This is the most fundamental argument that they are not, directly, an intelligence. They are not ever storing new information on a meaningful timescale.
    

    All major LLMs today have a nontrivial context window. Whether or not this constitutes "a meaningful timescale" is application dependant - for me it has been more than adequate.

    I also disagree that this has any bearing on whether or not "the machine is intelligent" or whether or not "submarines can swim".

  • There's nothing to say that you can't build something intelligent out of them by bolting a memory on it, though.

    Sure, it's not how we work, but I can imagine a system where the LLM does a lot of heavy lifting and allows more expensive, smaller networks that train during inference and RAG systems to learn how to do new things and keep persistent state and plan.

    • You aren't wrong and that is a fascinating area of research. I think the key thing is that the memory has to fundamentally influence the underlying model, or at least the response, in some way. Patching memory on top of an LLM is different from integrating it into the core model. To go back to human terms it is like an extra bit of storage, but not directly attached to our neo cortex. So it works more like a filter than a core part of our intelligence in the analogy. You think about something and assemble some thought and then it would go to this next filter layer and get augmented and that smaller layer is the only thing being updated.

      It is still meaningful, but it narrows what the intelligence can be sufficiently that it may not meet the threshold. Maybe it would, but it is probably too narrow. This is all strictly if we ask that it meet some human-like intelligence and not the philosophy of "what counts as intelligence" but... we are humans. The strongest things or at least the most honest definitions of intelligence I think exist are around our metacognitive ability to rewire the grey matter for survival not based on immediate action-reaction but the psychological time of analyzing the past to alter the future.

    • Memory is not just bolted on top of the latest models. They under go training on how and when to effectively use memory and how to use compaction to avoid running out of context when working on problems.

      2 replies →

  • But they're not "slow"! Unlike biological thinking, which has a speed limit, you can accelerate these chains of thought by orders of magnitude.

    • Their consolidation of memory speed is what I was referring to. The model iterations are essentially their form of collective memory. In the sense of the human model of intelligence we have thoughts. Thoughts become memory. New thoughts use that memory and become recursively updated thoughts. LLMs cannot update their memory very fast.

Not an expert but surely it's only a matter of time until there's a way to update with the latest information without having to retrain on the entire corpus?

  • On a technical level, sure, you could say it's a matter of time, but that could mean tomorrow, or in 20 years.

    And even after that, it still doesn't really solve the intrinsic problem of encoding truth. An LLM just models its training data, so new findings will be buried by virtue of being underrepresented. If you brute force the data/training somehow, maybe you can get it to sound like it's incorporating new facts, but in actuality it'll be broken and inconsistent.

  • It’s an extremely difficult problem, and if you know how to do that you could be a billionaire.

    It’s not impossible, obviously—humans do it—but it’s not yet certain that it’s possible with an LLM-sized architecture.

    • > It’s not impossible, obviously—humans do it

      It's still not at all obvious to me that LLMs work in the same way as the human brain, beyond a surface level. Obviously the "neurons" in neural nets resemble our brains in a sense, but is the resemblance metaphorical or literal?

      1 reply →

I enjoyed chatting to Opus 3 recently around recent world events, as well as more recent agentic development patterns etc