Comment by Der_Einzige

2 years ago

I hope that the long context length models start getting better. Claude 1 and GPT-4-128K both struggle hard once you get past about 32K tokens.

Most of the needle in a haystack papers are too simple of a task. They need harder tasks to test these long context length models for if they are truly remembering things or not.