Comment by D-Machine

19 days ago

> In practice it would make heavy use of RL, as humans do.

Oh, so you mean, it would be in a harness of some sort that lets it connect to sensors that tell it things about its position, speed, balance and etc? Well, yes, but then it isn't an LLM anymore, because it has more than language to model things!

Can’t we claim the sensor data (x=5,y=9…) is text too.

Not sure if it’s great just plain text, but would be better if could understand the position internally somehow.

  • No, we can't, words have meanings, the middle "L" in LLM refers to natural language, trying to redefine everything as "language" doesn't just mean an LLM can magically do everything.

    In particular, sensor data doesn't have the same semantics or structure at all as language (it is continuous data and should not be tokenized; it will be multi-channel, i.e. have multiple streams, whereas text is single-channel; outputs need to be multi-channel as well, and realtime, so it is unclear if the LLM recursive approach can work at all or is appropriate). The lack of contextuality / interdependency, both within and between these streams might even mean that e.g. self-attention is not that helpful and just computationally wasteful here. E.g. what was said thousands of tokens ago can be completely relevant and change the meaning of tokens being generated now, but any bike sensor data from more than about a second or so ago is completely irrelevant to all future needed outputs.

    Sure, maybe a transformer might still do well processing this data, but an LLM literally can't. It would require significant architectural changes just to be able to accept the inputs and make the outputs.

    • Ok. Maybe let’s talk about Claude , Gemini , ChatGPT whatever they are. I don’t care about your definition of llm, let’s talk current edge models

      1 reply →