Comment by sigbottle
3 hours ago
Oh wow there's still work being done on ampere?
I was wondering - I've been thinking about switching to AI systems programming (I know, easy task), but from what I understand, industry cloud GPUs are the main winners, right? Nobody's going to pay me (assuming I even had the skills) to optimize for consumer GPUs?
From what I understand, it's not just number + capacity + performance, it's literal core primitives. I don't think any of the "Blackwell" chips like the grace one or rtx 5090 have for example SM pairs in their ISA? And likewise similar fundamental differences between consumer and cloud hopper (where the majority of the perf is the cloud one's ISA?)
So I guess I'm wondering if I should buy a GPU myself or should I just rent on the cloud if I wanted to start getting some experience in this field. How do you even get experience in this normally anyways, do you get into really good schools and into their AI labs which have a lot of funding?
Why does publishing papers require the latest and greatest GPUs? My understanding is that the paper talks about very general principles.
> So I guess I'm wondering if I should buy a GPU myself or should I just rent on the cloud if I wanted to start getting some experience in this field. How do you even get experience in this normally anyways, do you get into really good schools and into their AI labs which have a lot of funding?
Unless you have money to throw around, you'd better start working on something, write some code and get them running on a leased GPU, before deciding on a long term plan
I do CUDA for a living (not inference) and for the life of me (and a couple of LLMs for that matter) I cannot figure out what you mean by "SM pairs".
Do you mean the coupled dies on stuff like the B200? An NVidia chip die has many SMs if so.
Do you mean TMEM MMA cooperative execution? I'm guessing that must be it given what the paper is about.
https://hazyresearch.stanford.edu/blog/2025-03-15-tk-blackwe...
cooperative execution yeah
as you can tell I do not do CUDA for a living :D
I still have 2x NVLinked A6000 and they aren't that bad compared to a single RTX 6000 Pro.
Look at am the email addresses. If you’ll recall there’s an embargo on China.
yep, https://github.com/poad42/cuda-fp8-ampere recently another attempt at squeezing whatever's left from ampere