Comment by Yenrabbit
1 year ago
Incredible work by the author stepping through all the nitty-gritty details and showing how easy it is to miss something subtle that could degrade performance.
1 year ago
Incredible work by the author stepping through all the nitty-gritty details and showing how easy it is to miss something subtle that could degrade performance.
Thanks! :) I'm pushing them into transformers, pytorch-gemma and collabing with the Gemma team to resolve all the issues :)
The RoPE fix should already be in transformers 4.38.2: https://github.com/huggingface/transformers/pull/29285
My main PR for transformers which fixes most of the issues (some still left): https://github.com/huggingface/transformers/pull/29402
Incredible indeed! Just hunting down one of these bugs feels like a very time consuming endeavor.
What's your approach for these more subtle numerical bugs?
I'm gonna guess he tried to reimplement some of the work from the ground up and wondered why certain results looked like they did.
1 reply →
Ye it was indeed very gruelling - but very fun!! I used torch.dist everywhere, read ll implementations side by side to compare them, and had to manually inspect losses, plot them etc. It's a bit hard to automate sadly, since new archs cause new issues.