Comment by techsystems

5 months ago

How does the context length scaling at 256K tokens compare to Llama's 1M in terms of performance? How are the contexts treated differently?

0 comments