Comment by tehryanx

2 months ago

I know you're right that there's a saturation point for context size, but it's not just context size that the larger models have, it's better grounding within that as a result of stronger, more discriminative attention patterns.

I'm not saying you're not going to drive confusion by overloading context, but the number of tokens required to trigger that failure mode in opus is going to be a lot higher than the number for gpt-oss-20b.

I'm pretty sure a model that can run on a cellphone is going to cap out it's context window long before opus or mythos would hit the point of diminishing returns on context overload. I think using a lower quality model with far fewer / noisier weights and less precise attention is going to drive false positives way before adding context to a SOTA model will.

You can even see here, AISLE had to print a retraction because someone checked their work and found that just pointing gpt-oss-20b at the patched version generated FP consistently: https://x.com/ChaseBrowe32432/status/2041953028027379806

0 comments

tehryanx

No comments yet

Contribute on Hacker News ↗