I doubt it. By not releasing it, Chinese companies will be unable to break TOS and use it to acquire high quality training data...which, I suspect, is how they've kept pace
Z.AI, Moonshot, DeepSeek all have a pipeline of data of their own now due to capturing a slice of the market through cheap tokens. It's not impossible to imagine that they might share the data too if the CCP thinks that will help their AI strategy.
No. Most data generated this way is poor quality. It's not the user responses and or queries. If the user does not know better than the LLM, you can generate bad responses. The value is in taking a superior model, submitting a query, and getting a higher quality output than you yourself could have generated, and using that to boost your model.
I doubt it. By not releasing it, Chinese companies will be unable to break TOS and use it to acquire high quality training data...which, I suspect, is how they've kept pace
Z.AI, Moonshot, DeepSeek all have a pipeline of data of their own now due to capturing a slice of the market through cheap tokens. It's not impossible to imagine that they might share the data too if the CCP thinks that will help their AI strategy.
No. Most data generated this way is poor quality. It's not the user responses and or queries. If the user does not know better than the LLM, you can generate bad responses. The value is in taking a superior model, submitting a query, and getting a higher quality output than you yourself could have generated, and using that to boost your model.
2 replies →
If deepseek is anything to go by they are still significantly behind.
Ominous phrasing.
[dead]