Comment by davebren
6 hours ago
Pre-training is not a good term if you are trying to compare it to LLM pre-training. Closer would be the model's architecture and learning algorithms which has been designed through decades of PhD research, and my point on that is that the differences are still much greater than the similarities.
No comments yet
Contribute on Hacker News ↗