Comment by saurik

2 months ago

> All Distilled and the original R1 versions seem to have accidentally assigned the padding token to <｜endofsentence｜>, which is mostly not a good idea, especially if you want to further finetune on top of these reasoning models. This will cause endless infinite generations, since most frameworks will mask the EOS token out as -100.

I couldn't tell if this was an error in the code running the model or in the model weights themselves; if/assuming the former, are these fixes being upstreamed to anywhere?

0 comments

saurik

No comments yet

Contribute on Hacker News ↗