Comment by teravor
17 hours ago
is that actually how they train them in the datacenter? the trillion sized weight vector gets cloned and sent off to groups of GPUs and averaged after?
17 hours ago
is that actually how they train them in the datacenter? the trillion sized weight vector gets cloned and sent off to groups of GPUs and averaged after?
No comments yet
Contribute on Hacker News ↗