Comment by bbor
3 months ago
Hmm that's a good point, but IMO the distinction isn't sharp enough to make a big deal over. The core idea of SoM as I see it is that human cognition is often quite decentralized, and that any illusion of a unified self is constructed piecemeal from the outputs of smaller, less-aware subsystems. Generally it's expected that the subsystems communicate with each other, yes, but I think "disproportionately rely on one or two members for complex questions but act like you're unified overall" still fits the bill.
The opinion I formed during the first few months of GPT4 release was that the society of the mind hypothesis was being disproved by the "maximalist" approach some were undertaking in order to build a true AGI. Turned out composing many LLMs into a cognitive architecture where each one had a specific purpose (memory, planning, etc ...) wasn't scaling.
On the same note, I suggest the following: training a transformer by "slicing" it in group of layers and force it to emit/receive tokens at each of those group's boundaries. What I expect: using text rather than neural activations should lead to decreased performance.
This is something you can observe in our societies: intelligence doesn't compose, you just don't double a group's overall intelligence by doubling the number of members. At best you'll observe decreasing return, at worst intelligence will decrease.