Comment by ithkuil

2 days ago

"Every problem in computer science can be solved with another level of indirection."

One could argue that the attention mechanism in transformers is already designed to do that.

But you need to train it more specifically with that in mind if you want it to be better at damping attention to parts that are deemed irrelevant by the subsequent evolution of the conversation.

And that requires the black art of ML training.

While thinking of doing this as a hack on top of the chat product feels more like engineering and we're more familiar with that as a field.