Comment by 01100011
11 hours ago
On point 5, I think this is the real moat for CUDA. Does Google have tools to optimize kernels on their TPUs? Do they have tools to optimize successive kernel launches on their TPUs? How easy is it to debug on a TPU(arguably CUDA could use work here but still...)? Does Google help me fully utilize their TPUs? Can I warm up a model on a TPU, checkpoint it, and launch the checkpoints to save time?
I am fairly pro-google(they invented the LLM, FFS...) and recognize the advantages(price/token, efficiency, vertical integration, established DCs w/ power allocations) but also know they have a habit of slightly sucking at everything but search.
No comments yet
Contribute on Hacker News ↗