← Back to context

Comment by metalspot

2 days ago

> glorified autocomplete machine

It is a next token prediction function and it is important to understand the technology accurately based on what it actually is.

What is unique about a next token prediction function though is that every computer program is just a string of instructions. At the theoretical limit a next token prediction function can generate the entire instruction stream (boot loader, OS, application) so a next token prediction function can theoretically generate any computer program, which means that it is a universal predictor for anything that a computer can simulate. Still not AGI/ASI in the woo-woo non-technical interpretations of those terms, but incredibly powerful.

What you’re saying is correct if the model is trained with all the knowledge humanity had, has and ever produce. But at the moment the next token prediction is quite limited to the training data.

Things could change if the model supports re-inforced leaning. That way the LLM would change the weights in real time based on a feedback loop, but again that could vastly improve the quality of the token prediction or completely degrade it as well

  • The distinction I would make here is that computer code is logical transformations on arbitrary data, not the actual data itself. An LLM can learn the entire space of logical transformation patterns from existing code, and can hallucinate new logical transformations, using a computer as a validator for the logic, so an LLM can create new logic as well as repeat existing patterns, and that logic can be applied to novel input data that the LLM has never seen before.

    • That’s not how LLMs work at the moment as far as I understand. LLM would not hallucinate any new logical transformation, rather just predict a transformation from its training data.

      I understand that there can be many different combinations for all the logical transformations in the training data. But still the number of combinations are finite and I would assume that large number of those combinations would not result in any meaningful outcome.

      Best outcome is that it just predicts a new pattern we haven’t discovered (LLM randomly connected the correct dots) one example is protein folding.