Comment by azinman2
15 days ago
Yes that’s a pretty giant accusation, especially given they’re buying boatloads of GPUs and have previous versions as well (it’s not like they’re starting with 3).
15 days ago
Yes that’s a pretty giant accusation, especially given they’re buying boatloads of GPUs and have previous versions as well (it’s not like they’re starting with 3).
1) Grok-2 was akin to GPT-3.5
2) Grok-3 comes out a month after DeepSeek R1 was open sourced. I think Grok-3 is DeepSeek R1 with some added params and about a month of training on the giant cluster, possibly a bit of in-house secret sauce added to the model or training methodology.
What are the chances that XAI just happened to have a thinking model close to as good as revolutionary DeepSeek but happened to launch it 30 days later?
It was both smart and pragmatic for XAI to simply use the best available open source stuff and layer their own stuff on top of it. Imagine they doubled the parameter count and trained it for 30 days, that would not even use half of the GPU power!
> What are the chances that XAI just happened to have a thinking model close to as good as revolutionary DeepSeek but happened to launch it 30 days later?
Extremely, extremely good. That was in fact the real point of the deepseek paper - it was extremely cheap to turn a frontier(ish?) model into a reasoning model. There is nothing suspicious about this timeline from an ML Ops point of view.
In fact DeepSeek themselves in a sort of victory lap released six OTHER models from other providers finetuned with reasoning as part of the initial drop.
Perhaps Grok-3 used the reasoning methodology from DeepSeek more than the underlying model, but the similarity of Grok-3 results to DeepSeek suggests that XAI used more than that.