Comment by gdiamos 1 year ago RNNs always had better scaling law curves than transformers.BPTT was their problem 0 comments gdiamos Reply No comments yet Contribute on Hacker News ↗
No comments yet
Contribute on Hacker News ↗