Comment by suninsight

8 months ago

Key questions:

1. The key data point seems to be Figure 6a. Where it compares performance on BABILong and claims Titans performance is at ~62%, as compared to GPT-4o-mini at ~42% for 100k sequence length.

However, GPT-4o and Claude are missing in this comparison - maybe because they perform better ?

2. There is no example provided of the Neural Memory Module in action. This is the first question I would ask of this paper.

3 comments

suninsight

tigershark 8 months ago

The biggest model that they have used has only 760M parameters, and it outperforms models 1 order of magnitude larger.

NotAnOtter 8 months ago

Gah dmn

ttul 8 months ago

This paper was written by a very small team at Google. It strikes me as similar in that regard to the original transformers paper. If this technique scales well, Google is no doubt already exploiting it for their next generation models -- and I think there are signs that Gemini 2.0 models already exploit this.