Comment by skeptic_ai

4 months ago

Ok. Maybe let’s talk about Claude , Gemini , ChatGPT whatever they are. I don’t care about your definition of llm, let’s talk current edge models

1 comment

skeptic_ai

D-Machine 4 months ago

People who don't care about definitions are always in over their heads and desperately trying to save face.

Current edge models literally can't do what I said, because they are single-channel and either recursively applied transformers, or most-likely Vision Transformers (setting aside generative image / video models, which are generally diffusion-based). The architectures are all wrong for handling certain kinds of data, tokenization is deeply harmful when dealing with continuous data (destroys relative meaning of magnitudes), and self-attention can be hugely inexpensive and encourage over-fitting when the problem is such that long-term dependencies are not relevant.

Try actually reading some papers and building some models using e.g. PyTorch. You can't understand these things by metaphor and analogy.