Comment by cinntaile

1 year ago

I'm gonna guess he tried to reimplement some of the work from the ground up and wondered why certain results looked like they did.

Yep! The goal was to implement Gemma in Unsloth to make finetuning faster and use less VRAM, and my reimplementation seems to get different results than the current ones.