← Back to context

Comment by Yenrabbit

1 year ago

Incredible work by the author stepping through all the nitty-gritty details and showing how easy it is to miss something subtle that could degrade performance.

Thanks! :) I'm pushing them into transformers, pytorch-gemma and collabing with the Gemma team to resolve all the issues :)

The RoPE fix should already be in transformers 4.38.2: https://github.com/huggingface/transformers/pull/29285

My main PR for transformers which fixes most of the issues (some still left): https://github.com/huggingface/transformers/pull/29402

  • Incredible indeed! Just hunting down one of these bugs feels like a very time consuming endeavor.

    What's your approach for these more subtle numerical bugs?

    • Ye it was indeed very gruelling - but very fun!! I used torch.dist everywhere, read ll implementations side by side to compare them, and had to manually inspect losses, plot them etc. It's a bit hard to automate sadly, since new archs cause new issues.