Comment by cshimmin
8 days ago
I wouldn't conflate MLPs with transformers, MLP is a small building block of almost any standard neural architecture (excluding spiking/neuromorphic types).
But to your point, the trend towards increasing inference-time compute costs, being ushered by CoT/reasoning models is one good reason to look for equally capable models that can be optimized for inference efficiency. Traditionally training was the main compute cost, so it's reasonable to ask if there's unexplored space there.
What I meant by "NNs and Transformers" is that once we've found the magical ingredient (and we've found it) people tend to all be focused in the same area of research. Mankind just got kinda lucky that all this can run on essentially game graphics boards!