Comment by YetAnotherNick
13 days ago
Depends on if you are using tensor parallelism or pipeline parallelism, in the second case you don't need any sharing.
13 days ago
Depends on if you are using tensor parallelism or pipeline parallelism, in the second case you don't need any sharing.
No comments yet
Contribute on Hacker News ↗