← Back to context Comment by lemonish97 2 days ago What is your evidence for this claim? 5 comments lemonish97 Reply fooker 2 days ago They say hill climbinghttps://microsoft.ai/news/building-a-hillclimbing-machine-la...Unless they specifically clarify that the testing and training benchmarks are completely separate, we have to assume they test on the same 'hill' the model climbs. artemisart 2 days ago Hill climbing doesn't mean much but absolutely doesn't imply they cheat on benchmarks. They have more details here https://microsoft.ai/news/introducing-mai-thinking-1/ it seems to be "RL on everything". jongalloway2 2 days ago [dead] ajyoon 2 days ago [flagged]
fooker 2 days ago They say hill climbinghttps://microsoft.ai/news/building-a-hillclimbing-machine-la...Unless they specifically clarify that the testing and training benchmarks are completely separate, we have to assume they test on the same 'hill' the model climbs. artemisart 2 days ago Hill climbing doesn't mean much but absolutely doesn't imply they cheat on benchmarks. They have more details here https://microsoft.ai/news/introducing-mai-thinking-1/ it seems to be "RL on everything". jongalloway2 2 days ago [dead] ajyoon 2 days ago [flagged]
artemisart 2 days ago Hill climbing doesn't mean much but absolutely doesn't imply they cheat on benchmarks. They have more details here https://microsoft.ai/news/introducing-mai-thinking-1/ it seems to be "RL on everything".
They say hill climbing
https://microsoft.ai/news/building-a-hillclimbing-machine-la...
Unless they specifically clarify that the testing and training benchmarks are completely separate, we have to assume they test on the same 'hill' the model climbs.
Hill climbing doesn't mean much but absolutely doesn't imply they cheat on benchmarks. They have more details here https://microsoft.ai/news/introducing-mai-thinking-1/ it seems to be "RL on everything".
[dead]
[flagged]