Comment by algo_trader
6 months ago
Are all these "post/mid-training tweaks" important if you have a specific domain with abundant/verified/synthesis data and labels?
Can a small team working on ASI/domain-specific stick to scaling 2024-era best practices training stack? Or will they miss massive improvements?
> Are all these post/mid-training tweaks important with abundant, verified, synthetic domain data?
No. Many are aimed at cleaning/aligning noisy, mixed-domain data. With abundant, high-quality domain data, you can skip most of the complexity and focus on direct SFT/RL on your corpus.
> Can a small team stick to scaling 2024-era best practices?
2024 was the year of SFT. I believe fitting reasoning traces to your final responses via RL is the technique-du-jour of 2025. Jumping from SFT to RL training might be biggest gain here if RL can be applied to your problem (e.g. math, coding etc).