← Back to context

Comment by skerit

4 hours ago

> GPU compute for self-study

Those suggestions they make for a B200 start at $4.99 an hour.

Is that really required, for starting out? I've been tinkering with my own from-scratch LLM, but in the early phases I don't need anything more than a 4090 on Vast.ai

TA here. Definitely not! In fact we explicitly added sections in the first assignment to allow for scaling down to even local compute (M-series GPUs). For assignment 2 there are a few regions that require Triton support for your GPU, but everything can be adapted for much cheaper GPUs.

We were lucky enough to get Blackwell GPUs for Stanford students this year, which is why the writeups are written mostly around them.

I imagine it's a lot like FPGAs:

- the hardware you need for a production use-case is relatively small, because production {models, bitstreams} have been heavily size-optimized, stripping out everything not needed to get a good result for the target use-cases

- but the hardware you need when tinkering/learning how to design {compute kernels, IP blocks} in the first place, must be quite a bit more powerful / higher-capacity, because your experiments will intentionally be the opposite of optimized: they'll be built for legibility / introspectability / debuggability at every level, which massively inflates and de-optimizes the resulting {model, bitstream}.

(And, to be clear here, "running someone else's finished model, which was designed and optimized to be used on something like a 4090, against your own prompt" is a kind of experimenting, which is cheap, in the same way that "deploying someone else's pre-baked FPGA bitstream, that was designed and synthesized for a $20 target FPGA, onto your own instance of that $20 FPGA, and then feeding your own input signals to it" is cheap. But that's not the kind of experimenting you'd be doing in this course while learning to design your own models!)

You're right to be sceptical. I have trained reasonably good SLMs for the TinyStories dataset on my 4060Ti (16GB) with no problems. You'll only encounter problems if you want to try if your ideas scale up to models any bigger than "arguably tiny".

It seems strange that the required resources aren't provided by the educational institution?

  • We do provide resources for enrolled students. The online suggestions are for external students or Stanford students who we weren't able to admit.

  • Two schools of thought - people are paying 100K per year, we should provide everything. Second is - they are paying 100K per year, do you think they will care for couple of hundred more.

I beliee these are affordable enough for the intended audience (which is Stanford undergrad/master)

  • for them Modal is sponsoring the compute, as stated on the website, the prices are for remote followers