Comment by YetAnotherNick
5 hours ago
Depends on if you are using tensor parallelism or pipeline parallelism, in the second case you don't need any sharing.
5 hours ago
Depends on if you are using tensor parallelism or pipeline parallelism, in the second case you don't need any sharing.
No comments yet
Contribute on Hacker News ↗