Comment by Alifatisk

2 days ago

> MAI-Thinking-1 is built with enterprise readiness in mind. It supports long context with a 256k token window

Isn’t 1M becoming the norm?

5 comments

Alifatisk

vb-8448 2 days ago

1M it's only marketing, in my experience above 150k quality noticeable drops.

Claude code will suggest you to start a new session or compact if you go above 100k.

Bolwin 1 day ago

In my experience above 60k quality noticeably drops.
30k for open source models

stingraycharles 2 days ago

Yes it is, but I can imagine that they want to start out a bit smaller to see how well things scale, and/or did not yet have the time to work on optimizing for the large context windows.

droidjj 2 days ago
I struggle to get quality results from the frontier models at contexts > 256k anyway.
- stingraycharles 2 days ago
  
  Yup, same experience, it’s because the attention basically has exponential complexity. So at large context windows, they need to compress the attention (eg group multiple tokens together), when then leads to loss in accuracy.
  It’s almost always better to keep your context windows small.