← Back to context

Comment by skeptic_ai

18 days ago

Ok. Maybe let’s talk about Claude , Gemini , ChatGPT whatever they are. I don’t care about your definition of llm, let’s talk current edge models

People who don't care about definitions are always in over their heads and desperately trying to save face.

Current edge models literally can't do what I said, because they are single-channel and either recursively applied transformers, or most-likely Vision Transformers (setting aside generative image / video models, which are generally diffusion-based). The architectures are all wrong for handling certain kinds of data, tokenization is deeply harmful when dealing with continuous data (destroys relative meaning of magnitudes), and self-attention can be hugely inexpensive and encourage over-fitting when the problem is such that long-term dependencies are not relevant.

Try actually reading some papers and building some models using e.g. PyTorch. You can't understand these things by metaphor and analogy.