Comment by cbg0

1 day ago

One of the things I'm always looking at with new models released is long context performance, and based on the system card it seems like they've cracked it:

  GraphWalks BFS 256K-1M

  Mythos     Opus     GPT5.4

  80.0%     38.7%     21.4%

12 comments

cbg0

metadat 1 day ago

Data source:

https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

(Search for “graphwalk”.)

If true, the SWE bench performance looks like a major upgrade.

radicality 1 day ago

Huh, I don’t know what “long context performance” means exactly in these tests, so completely anecdotally , my experience with gpt5.4 via codex cli vs Claude code opus, gpt5.4 seems to do significantly better in long contexts I think partly due to some special context compaction stored in encrypted blobs. On long conversations opus in Claude code will for me lose memory of what we were working on earlier, whereas one of my codex chats is already at >1B tokens and is still very coherent and remembers things I asked of it at the beginning of the convo.

pertymcpert 1 day ago
This isn’t talking about compaction. This refers to performance as the model is loaded with 500k to 1m tokens.
- radicality 12 hours ago
  
  Ah, thanks, makes sense, I’ll read more about this

lyzaer 1 day ago

[dead]

frog437 1 day ago

[flagged]

himata4113 1 day ago

this seems to be similar to gpt-pro, they just have a very large attention window (which is why it's so expensive to run) true attention window of most models is 8096 tokens.

thegeomaster 1 day ago
What's the "attention window"? Are you alleging these frontier models use something like SWA? Seems highly unlikely.
- himata4113 16 hours ago
  
  well the attention is a matrix at the end of a day which scales exponentially, 1m tokens would need more memory than any computer system in the world can hold. They maybe have larger ones such as 16k to 32k, but you can just see how GLM models work for more information.
  Deepseek is the frontrunner in this technology afaik.
appcustodian2 1 day ago
source on the 8096 tokens number? i'm vaguely aware that some previous models attended more to the beginning and end of conversations which doesn't seem to fit a simple contiguous "attention window" within the greater context but would love to know more
- himata4113 16 hours ago
  
  well 8096 is just the first number that came to my mind, obviously frontier models have 32k or above, but they essentially they have a layer which "looks" at a limited view of the entire context window. {[1m x 3-4 weights] attention layer to determine what is actually important} -> {all other layers}