Comment by freehorse
2 days ago
My guess is that it is safer for them to use synthetic data only, as they have less to worry about stuff like people using the models for erotic roleplay and similar stuff.
2 days ago
My guess is that it is safer for them to use synthetic data only, as they have less to worry about stuff like people using the models for erotic roleplay and similar stuff.
Why "worry"?
Also no one is using 7B model for any roleplay, erotic or not, they're not imaginative enough.
I mean if the model is intended to be deployed somewhere for some purpose by some business, as a business you probably would not want to worry whether users may use to produce sth that could be seen as embarrassing or problematic pr-wise. Having a sanitized and more directed training set can help with that. MS does not produce only 7B models, and 7B models can still say "embarrassing" things. I may be wrong of course.