Comment by DaedalusII

12 days ago

the first problem is a whole generation of people learned to code ai applications by fiddling around with the gpu in their gaming pc 10 years ago. so an entire generation of talent grew up with cuda

the second problem is that so many libraries and existing software is cuda only. even some obscure hardware stuff. i discovered the hard way that some AMD thinkpads dont support thunderbolt transfer speeds on their usb-c ports, whereas nvidia ones do

the third problem is that the cost to develop a cuda equivalent is so great that its cheaper for companies like google to make TPU and amazon to make Trainium. its literally cheaper to make an entire new chipset than it is to fix AMd. i dont see companies like apple/amzn/goog etc fixing AMDs chips

2 comments

DaedalusII

staticman2 12 days ago

>its literally cheaper to make an entire new chipset than it is to fix AMd

Is it? Or does AMD expect to make a profit and it's cheaper to make your own chips at cost?

DaedalusII 12 days ago

i mean its cheaper from an enterprise customer perspective. if a company is training an LLM, writing their training programs to use AMDs hardware instead of just using CUDA is so expensive and time consuming that it is cheaper to pay four times the price and use nvidia hardware. in this space its important to move fast, although that economic will shift over time
which is why nvidia hardware trades at a 4x premium to AMD
its not necessarily cheaper to make chips at cost either. nobody is making them, only designing them. so first you have to design your new chip, then you have to get a minimum order in with the chip fab so big it competes on unit economics, and then finally you have to get your dev team to write a CUDA equivalent software that is a problem so hard its only really been solved by apple, google, intel, and nvidia
only companies with big fab orders can get priority too.. if a company did all of the above and was ready to go, they probably wouldn't get fab capacity until 2030