Comment by quantadev

8 days ago

Right now as long as the rocket's heading straight up, everyone's on board with MLPs (Multilayer Perceptrons/Transformers)! Why not stay on the same rocket for now!? We're almost at AGI already!

I wouldn't conflate MLPs with transformers, MLP is a small building block of almost any standard neural architecture (excluding spiking/neuromorphic types).

But to your point, the trend towards increasing inference-time compute costs, being ushered by CoT/reasoning models is one good reason to look for equally capable models that can be optimized for inference efficiency. Traditionally training was the main compute cost, so it's reasonable to ask if there's unexplored space there.

  • What I meant by "NNs and Transformers" is that once we've found the magical ingredient (and we've found it) people tend to all be focused in the same area of research. Mankind just got kinda lucky that all this can run on essentially game graphics boards!

Why are you conflating MLPs in general with specifically transformers?

  • I consider MLPs the building blocks of all this, and is what makes things a neural net, as opposed to some other data structure.

    • Sure. But that isn’t a reason to conflate the two?

      OP wasn’t suggesting looking for an alternative/successor to MLPs, but for an alternative/successor to transformers (while presumably still using MLPs) in the same way that transformers are an alternative/successor to LSTMs.

      1 reply →