Comment by skeptic_ai
18 days ago
Can’t we claim the sensor data (x=5,y=9…) is text too.
Not sure if it’s great just plain text, but would be better if could understand the position internally somehow.
18 days ago
Can’t we claim the sensor data (x=5,y=9…) is text too.
Not sure if it’s great just plain text, but would be better if could understand the position internally somehow.
No, we can't, words have meanings, the middle "L" in LLM refers to natural language, trying to redefine everything as "language" doesn't just mean an LLM can magically do everything.
In particular, sensor data doesn't have the same semantics or structure at all as language (it is continuous data and should not be tokenized; it will be multi-channel, i.e. have multiple streams, whereas text is single-channel; outputs need to be multi-channel as well, and realtime, so it is unclear if the LLM recursive approach can work at all or is appropriate). The lack of contextuality / interdependency, both within and between these streams might even mean that e.g. self-attention is not that helpful and just computationally wasteful here. E.g. what was said thousands of tokens ago can be completely relevant and change the meaning of tokens being generated now, but any bike sensor data from more than about a second or so ago is completely irrelevant to all future needed outputs.
Sure, maybe a transformer might still do well processing this data, but an LLM literally can't. It would require significant architectural changes just to be able to accept the inputs and make the outputs.
Ok. Maybe let’s talk about Claude , Gemini , ChatGPT whatever they are. I don’t care about your definition of llm, let’s talk current edge models
People who don't care about definitions are always in over their heads and desperately trying to save face.
Current edge models literally can't do what I said, because they are single-channel and either recursively applied transformers, or most-likely Vision Transformers (setting aside generative image / video models, which are generally diffusion-based). The architectures are all wrong for handling certain kinds of data, tokenization is deeply harmful when dealing with continuous data (destroys relative meaning of magnitudes), and self-attention can be hugely inexpensive and encourage over-fitting when the problem is such that long-term dependencies are not relevant.
Try actually reading some papers and building some models using e.g. PyTorch. You can't understand these things by metaphor and analogy.