MAI-Thinking-1

2 days ago (microsoft.ai)

https://microsoft.ai/wp-content/uploads/2026/06/main_2026060...

Launching seven new MAI models: https://microsoft.ai/news/building-a-hillclimbing-machine-la...

> Second, clean data. MAI-Thinking-1 was trained on clean and appropriately licensed data, with AI-generated content excluded from pre-training. This matters for quality, provenance, and control. If we cannot account for what shaped a model, we cannot fully understand its behavior or credibly improve it.

Shots fired?

It would be interesting to see how far "clean data" can go on the scaling laws.

  • I would really like to see what "appropriately licensed data" means. Cannot imagine they didn't copy all open repo's on GitHub, and can't imagine they asked for permission, or are reproducing license texts from these repo's now. It sounds hand wavy.

    P.S. A fairly basic website otherwise, but it unfortunately seems to be hacking scroll for no good reason.

    • Presumably their position remains that training on public repos is fair use and doesn't require a license. If it doesn't require a license it's still "appropriately licensed".

    • Recently, GitHub has changed their terms of service to use all user data for AI training unless users explicitly opt out. This is probably the way Microsoft has obtained "appropriately licensed data".

      1 reply →

  • I doubt any lab would say otherwise, they all _claim_ to use licensed data

    • Maybe, but Microsoft, through their partnership with OpenAI, is already involved in major copyright lawsuits. That is probably a driving force for this move, actually... I doubt they would want to tempt fate while those lawsuits are on-going.

  • all the labs "clean" their pretraining data, and you can have your pretraining data to be minimally ai generated but also spam synthetic post-training data

  • I'd assume it's not up to par with Qwen-3.5 then, which has been distilling Claude, and the quality of the model is probably a direct result of that.

  • I'm interested how much "Clean Data" is synthetic data from "unclean" models...

  • Interesting. Wasn't their previous attempt (Phi) trained mostly on synthetic data?

It's good there is a new player on the market, I take benchmark tables with a grain of salt, however. Speaking about model presentation it's funny to see how clearly their website is inspired by other AI company blogs with extra innovation of hijacked scrollbar.

The benchmarks are a bit of a disaster? It's at about DeepSeek V3.2 level, but with about 50% more parameters. Loses handily to the also smaller GLM-5.1, and even worse to the similarly sized Kimi K2.6.

  • Yes and no. Yes from a user PoV, I don't really see a great reason to use this other than for enterprises that care about using a model not trained on copyrighted data (not sure what the market really is for this anymore, feels like this concern has been forgotten by most customers).

    From a strategic PoV for MS, all the models you cited are distilling GPT/Claude/Gemini and wouldn't be anywhere as good as they are without this distillation, which in turn means you are dependent on OAI/Anthropic/G first shipping a good model to generate data for your training. This MAI model is trained from scratch with no synthetic data or distillation. So in term of benchmark its obviously much harder to get strong score and thus not a disaster if they can keep on improving.

  • They claim to not be training to the benchmarks at all. It'll be interesting to see how it stacks up in actual use.

Does this mean that work created with it can be copyrighted? Since the courts ruled that the inclusion of pilfered IP was the reason other model's work cannot be copyrighted, I would think so! In that case, this is a completely different beast. It can maybe be used for things that need a durable copyright.

Looks like the OAI divergence is finally taking place. Seems like the comparisons are mainly with Opus 4.6 and GPT 5.4 though. Still, exciting to see a new frontier player.

  • Is it a frontier player though, or perhaps a new benchmaxxed model? People were saying similar things about Grok but it ultimately amounted to little.

    • "preferred by humans over Sonnet 4.6" makes it pretty clearly not benchmaxxed though.

      At least when you define benchmaxxed as "good in benchmarks but not human preference".

  • Post 4.6 Anthropic models do not exactly have a stellar reputation, so that choice is smart.

> MAI-Thinking-1 is a 35B-active, ~1T-total parameters, sparse Mixture of Experts model, a smaller inference footprint than much larger models.

This seemingly nonsensical sentence (of course this will have a smaller inference footprint than larger models) suggests this model's competitors have larger inference footprints and total parameter sizes.

> MAI-Thinking-1 is built with enterprise readiness in mind. It supports long context with a 256k token window

Isn’t 1M becoming the norm?

  • 1M it's only marketing, in my experience above 150k quality noticeable drops.

    Claude code will suggest you to start a new session or compact if you go above 100k.

    • In my experience above 60k quality noticeably drops.

      30k for open source models

  • Yes it is, but I can imagine that they want to start out a bit smaller to see how well things scale, and/or did not yet have the time to work on optimizing for the large context windows.

I like it so much when a website hijacks the way my scroll works. This is truly innovative.

At least there shouldn't be any complaints about benchmaxing this time.

  • Just because it is performing rather poorly by comparison, it doesn’t mean it isn’t benchmaxxed. It can still be worse than it appears.

They've hijacked scrolling. They've hijacked the spacebar. It flickers like crazy when I try to move through the article. Trying to get through it is an exercise in madness.

Honestly, a lame release of mediocre models.

I was most excited about the "frontier tuning." Like, it will actually watch you do stuff and learn to do it for you? That would be actually interesting.

But no, it's just a data labelling interface: https://learn.microsoft.com/en-us/microsoft-365/copilot/copi.... You have to provide the instruction and give feedback and there is a whole UI with hour-lonf wait between steps. So basically they want you to do the labelling to train a model, or at least that's how it looks from the outside

Also the mission statement of Humanist AI is the most boring, but tries to sound way too grand. Like "all the cool labs have a mission statement, so we should also have one" vibes

7 modes launched. 5 models in the dropdown. Only 4 actually usable :(

About time Microsoft joined the fray. After the OpenAI divorce, it really looked like Microsoft was going to become another Uber.