Comment by yahoozoo

7 months ago

How are they doing this? Does it just make heavy use of web searches? A continuously updated RAG store? Why don’t other companies do it?

2 comments

yahoozoo

mike_hearn 7 months ago

Nothing stops you continuously training a foundation model and serving checkpoints, but historically there were weird cliffs and instabilities where more training would make things worse rather than better. The trick is to introduce more data into the pre-training mix and keep training in ways that don't cause the model to regress. Presumably they've figured that out.

It's probably enabled by the huge datacenter xAI has. Most AI labs haven't built their own datacenter, and have to choose between doing experiments on new architectures, serving live traffic and doing more training on their existing models. Perhaps xAI can do all three simultaneously.

jasonjmcghee 7 months ago

In 2021 Google did RETRO which was RAG at multi trillion token scale.

https://deepmind.google/discover/blog/improving-language-mod...