Comment by ithkuil
1 year ago
I think there are two distinct areas. One is the building of the representations, which is achieved by fitting. The other area is loosely defined as "computing" which is some kind of searching for a path through representation space. All of that is wrapped in a translation layer that can turn those representations into stuff we humans can understand and interact with. All of that is achieved to some extent by current transformer architectures, but I guess some believe that they are not quite as effective at the "computation/search" stage.
But how does it get good at "computing"? The way I see it, we either program them to do so manually, or we use ML, at which case the model "fits" the computation based on training examples or environmental feedback, no? What am I missing?
the distinction is fuzzy indeed, especially if any thing that you "program in manually" has some parameters that are learned.
Conceptually we already have parts of the model that are not learned: the architecture of the model itself.