Comment by kamranjon
6 days ago
You can call it pre-training but it’s based on Gemma 3 4b - which was already pre-trained on a general corpus. It’s the same process, so you’re just splitting hairs. That is kind of my point, fine-tuning is just more training. If you’re going to say that fine-tuning is useless you are basically saying that all instruct-tuned models are useless as well - because they are all just pre-trained models that have been subsequently trained (fine-tuned) on instruction datasets.
No comments yet
Contribute on Hacker News ↗