Comment by kamranjon

6 days ago

You can call it pre-training but it’s based on Gemma 3 4b - which was already pre-trained on a general corpus. It’s the same process, so you’re just splitting hairs. That is kind of my point, fine-tuning is just more training. If you’re going to say that fine-tuning is useless you are basically saying that all instruct-tuned models are useless as well - because they are all just pre-trained models that have been subsequently trained (fine-tuned) on instruction datasets.

0 comments

kamranjon

No comments yet

Contribute on Hacker News ↗