Comment by pyentropy
21 hours ago
The number of tokens you predict at time (multi or not) has nothing to do with whether the model wants to emit any, some or a lot of reasoning tokens in reasoning tag -- similar to how branch prediction will not really change the for loop iteration count.
no it might. a high reasoning task is probably harder than a low reasoning task, so the same MTP LLM will predict more correct tokens on the low reasoning task. to compensate for this, big labs likely have different MTP LLMs for different cases. it would make sense for them to do this