Comment by starchild3001
6 months ago
> Are all these post/mid-training tweaks important with abundant, verified, synthetic domain data?
No. Many are aimed at cleaning/aligning noisy, mixed-domain data. With abundant, high-quality domain data, you can skip most of the complexity and focus on direct SFT/RL on your corpus.
> Can a small team stick to scaling 2024-era best practices?
2024 was the year of SFT. I believe fitting reasoning traces to your final responses via RL is the technique-du-jour of 2025. Jumping from SFT to RL training might be biggest gain here if RL can be applied to your problem (e.g. math, coding etc).
No comments yet
Contribute on Hacker News ↗