Comment by D-Machine

15 days ago

People who don't care about definitions are always in over their heads and desperately trying to save face.

Current edge models literally can't do what I said, because they are single-channel and either recursively applied transformers, or most-likely Vision Transformers (setting aside generative image / video models, which are generally diffusion-based). The architectures are all wrong for handling certain kinds of data, tokenization is deeply harmful when dealing with continuous data (destroys relative meaning of magnitudes), and self-attention can be hugely inexpensive and encourage over-fitting when the problem is such that long-term dependencies are not relevant.

Try actually reading some papers and building some models using e.g. PyTorch. You can't understand these things by metaphor and analogy.