Comment by algo_trader

6 months ago

Are all these "post/mid-training tweaks" important if you have a specific domain with abundant/verified/synthesis data and labels?

Can a small team working on ASI/domain-specific stick to scaling 2024-era best practices training stack? Or will they miss massive improvements?

1 comment

algo_trader

starchild3001 6 months ago

> Are all these post/mid-training tweaks important with abundant, verified, synthetic domain data?

No. Many are aimed at cleaning/aligning noisy, mixed-domain data. With abundant, high-quality domain data, you can skip most of the complexity and focus on direct SFT/RL on your corpus.

> Can a small team stick to scaling 2024-era best practices?

2024 was the year of SFT. I believe fitting reasoning traces to your final responses via RL is the technique-du-jour of 2025. Jumping from SFT to RL training might be biggest gain here if RL can be applied to your problem (e.g. math, coding etc).