← Back to context

Comment by nodja

2 months ago

My intuition is that the harder it is for an LLM to do something during training the more actual compression/learning will be encoded in it's weights. With multi-token/diffusion it becomes much easier to "reward/loss hack" your way, this won't matter much during pretraining, but I assume a lot of "cheating" will happen in the finetune/RL phase.