Comment by YetAnotherNick
9 hours ago
Depends on if you are using tensor parallelism or pipeline parallelism, in the second case you don't need any sharing.
9 hours ago
Depends on if you are using tensor parallelism or pipeline parallelism, in the second case you don't need any sharing.
No comments yet
Contribute on Hacker News ↗