← Back to context

Comment by resters

14 days ago

DeepSeek R1 is by far the best at writing prose of any model, including Grok-3, GPT-4o, o1-pro, o3, claude, etc.

Paste in a snippet from a book and ask the model to continue the story in the style of the snippet. It's surprising how bad most of the models are.

Grok-3 comes in a close second, likely because it is actually DeepSeek R1 with a few mods behind the scenes.

why do you think that grok 3 is deepseek, out of curiosity?

  • Yes that’s a pretty giant accusation, especially given they’re buying boatloads of GPUs and have previous versions as well (it’s not like they’re starting with 3).

    • 1) Grok-2 was akin to GPT-3.5

      2) Grok-3 comes out a month after DeepSeek R1 was open sourced. I think Grok-3 is DeepSeek R1 with some added params and about a month of training on the giant cluster, possibly a bit of in-house secret sauce added to the model or training methodology.

      What are the chances that XAI just happened to have a thinking model close to as good as revolutionary DeepSeek but happened to launch it 30 days later?

      It was both smart and pragmatic for XAI to simply use the best available open source stuff and layer their own stuff on top of it. Imagine they doubled the parameter count and trained it for 30 days, that would not even use half of the GPU power!

      2 replies →

If it was Elon is even more stupid than he lets on because

DS3: 5M training run Grok3: 400M training run

for 2% difference in the benchmarks.