← Back to context

Comment by jerjerjer

2 months ago

Is there a benchmark to measure real effective context length?

Sure, gpt-4o has a context window of 128k, but it loses a lot from the beginning/middle.

They often publish "needle in a haystack" benchmarks that look very good, but my subjective experience with a large context is always bad. Maybe we need better benchmarks.