Comment by ijk
4 days ago
In aggregate? Signs point to yes. For the general purpose SFT base models. We see some evidence even with RNNs vs Transformers. You're essentially finding a function that models language. Use the same optimization function, get a similar result.
However, the RL and especially the RLHF does a lot to reshape the responses, and that's potentially a lot more varied. For the training that wasn't just cribbed from ChatGPT, anyway.
Lastly, it's unlikely that you'll get the _exact same_ responses; there's too many variables at inference time alone. And as for training, we can fingerprint models by their vocabulary to a certain extent. So in practical terms there's probably always going to be some differences.
This assumes our current training approaches don't change too drastically, of course.
No comments yet
Contribute on Hacker News ↗