Comment by augment_me

1 day ago

NVIDIA chips are more versatile. During training, you might need to schedule things to the SFU(Special Function unit that does sin, cos, 1/sqrt(x), etc), you might need to run epilogues, save intermediary computations, save gradients, etc. When you train, you might need to collect data from various GPUs, so you need to support interconnects, remote SMEM writing, etc.

Once you have trained, you have frozen weights/feed-forward networks that consist out of frozen weights that you can just program in and run data over. These weights can be duplicated across any amount of devices and just sit there and run inference with new data.

If this turns out to be the future use-case for NNs(it is today), then Google are better set.

3 comments

augment_me

grandmczeb 1 day ago

All of those are things you can do with TPUs

eikenberry 21 hours ago

Won't the need to train increase as the need for specialized, smaller models increases and we need to train their many variations? Also what about models that continuously learn/(re)train? Seems to me the need for training will only go up in the future.

01100011 12 hours ago

That's the thing - nobody knows. LLM architecture is constantly evolving and people are trying all kinds of things.