← Back to context

Comment by sailingparrot

3 months ago

> They rebuild the "long term plan" anew for every token

Well no, there is attention in the LLM which allows it to look back at it's "internal thought" during the previous tokens.

Token T at layer L, can attend to a projection of the hidden states of all tokens < T at L. So its definitely not starting anew at every token and is able to iterate on an existing plan.

Its not a perfect mechanism for sure, and there is work to make LLMs able to carry more information forward (e.g. feedback transformers), but they can definitely do some of that today.

This isn't the same as planning. Consider what happens when tokens from another source are appended.

  • I don't follow how this relates to what we are discussing. Autoregressive LLMs are able to plan within a single forward pass and are able to look back at their previous reasoning and do not start anew at each token like you said.

    If you append tokens from another source, like in a turn base conversation, then the LLM will process all the new appended tokens in parallel while still being able to look back at it's previous internal state (and thus past reasoning/planning in latent space) from the already processed tokens, then will adjust the plan based on the new information.

    What happens to you as a human if you come up with a plan with limited information and new information is provided to you?

    • Not the original person you are replying to, but I wanted to add:

      Yes, they can plan within a single forward pass like you said, but I still think they "start anew at each token" because they have no state/memory that is not the output.

      I guess this is differing interpretations of the meaning of "start anew", but personally I would agree that having no internal state and simply looking back at it's previous output to form a new token is "starting anew".

      But I'm also not well informed about the topic so happy to be corrected.

      7 replies →