Comment by schnitzelstoat
5 hours ago
Is such a large context window even desirable? It seems like it would consume an awful lot of tokens and, unless one was very careful to curate the context, could even result in worse performance.
5 hours ago
Is such a large context window even desirable? It seems like it would consume an awful lot of tokens and, unless one was very careful to curate the context, could even result in worse performance.
Yes. That is, it is if you imagine a magically good self attention mechanism that could decide what in the context to attend to at any one moment. Then it would be like working with a polymath that has incredible memory. Or bringing in that aged but still senior Chief of Staff of a large company that knows where every body was buried, and why every decision was made at the time it was made, or a professor of film that has seen and can remember thousands of films.
Shockingly, we seem to have found a self attention mechanism of that quality, it just has the sad property of growing at O(N^2) where N is the context length.
I remember when a large context was 8k! Nowadays that would seem extremely small, because we have new use-cases that require much larger context sizes. Maybe in the future, we will invent ways to use inference on very large contexts that we cannot even imagine today.
Thats either the R&D part of this chip or Nvidia has the use case.
Nvidia uses ML for finetuning and architecturing their chips. this might be one use case.
Another one would be to put EVERYTHING from your company into this context window. It would be easier to create 'THE' model for every company or person. It might also be saver than having a model train with your data because you don't have a model with all your data, only memory.
Yes because scaling tends to pay off unexpectedly
imagine if you were making a database software and u could fit source code of all existing databases and their github issues in context.
For larger codebases ... maybe it will cut down on "let me create a random number wrapper for the 15th time" type problems.
You should already have skills which mention these utilities.
But maybe that’s enough tokens to feed an entire lifetime of user behaviour in for the digital twin dystopia?
"type problems" was doing the heavy lifting there, not literally "this utility".