this seems more like 'llm psychology' than evidence of a rolling model; in other words I would take that prompt as evidence that they don't want users to interrogate the cutoff date than I would that theyre somehow using a rolling model.
Nothing stops you continuously training a foundation model and serving checkpoints, but historically there were weird cliffs and instabilities where more training would make things worse rather than better. The trick is to introduce more data into the pre-training mix and keep training in ways that don't cause the model to regress. Presumably they've figured that out.
It's probably enabled by the huge datacenter xAI has. Most AI labs haven't built their own datacenter, and have to choose between doing experiments on new architectures, serving live traffic and doing more training on their existing models. Perhaps xAI can do all three simultaneously.
source? this would defy a lot of convention and would cause a lot of instability
This is what it says in the supposed system prompt see https://news.ycombinator.com/item?id=44517453
this seems more like 'llm psychology' than evidence of a rolling model; in other words I would take that prompt as evidence that they don't want users to interrogate the cutoff date than I would that theyre somehow using a rolling model.
How are they doing this? Does it just make heavy use of web searches? A continuously updated RAG store? Why don’t other companies do it?
Nothing stops you continuously training a foundation model and serving checkpoints, but historically there were weird cliffs and instabilities where more training would make things worse rather than better. The trick is to introduce more data into the pre-training mix and keep training in ways that don't cause the model to regress. Presumably they've figured that out.
It's probably enabled by the huge datacenter xAI has. Most AI labs haven't built their own datacenter, and have to choose between doing experiments on new architectures, serving live traffic and doing more training on their existing models. Perhaps xAI can do all three simultaneously.
In 2021 Google did RETRO which was RAG at multi trillion token scale.
https://deepmind.google/discover/blog/improving-language-mod...