Comment by fg137
2 hours ago
I recently completed the 2025 version of this course (video + most assignments, skipping some of the most costly part of the tasks). That's quite something. There is a lot going on in the first two assignments which required a ton of thinking and debugging. Despite having a decent foundation in deep learning, it took me several months to finish it using bits of my after-work hours and weekends. (I am not a model part-time student by any means, and sometimes I didn't get to work on this for days, but it could have been much worse.) Hard to imagine how enrolled Stanford students manage to submit assignments in two week cadence.
Coming back to the course, kudos to the course staff, including professors and TAs. The obviously put a ton of thought in designing the course, putting together those slides that contain the latest updates of the field, and preparing the wonderful assignments. You get to create a real LM and explore other important parts of LLM pipeline from small building blocks and validate them, validate each step, and see for yourself how everything comes together. You can really feel a sense of achievement after completing the assignments.
That said, while the staff obviously put a lot of effort into making this accessible to everyone, I wish they made a bit more effort in clarifying the environment requirement. Their harness works best on a Linux environment with NVIDIA GPU, which may be taken for granted for researchers but rare for home computer setup. Their setup also expects specific CUDA versions and/or architectecture. For following at home, the next best setup is Windows with WSL2 + NVIDIA GPU, plus leased GPUs on various platforms, none of which is exactly trivial (or cheap, for that matter). It would be nice if the staff could put together a bit more guidance in that area, especially how someone without any compatible GPU can make the most out of the course. (One thing I learned is that if you use Mac OS and are not careful about memory analysis, your python code could freeze and force reboot your machine).
TA here. Noted! I now have more resources to test more environments, and will do so whenever possible. I think freezing due to memory overuse is going to be a problem with anything you code yourself, but I do think we could be more rigorous with guiding people to achieve limited memory use for the tokenizer task.
IMO the cost of renting GPUs is a bit overstated in these comments. Generally almost all of the development can be done locally, and then ran for a short period of time using on-demand GPUs. For assignment 1, you can run everything on your local machine, even if you don't have a GPU. For A1 and A2, you can do (most of) the tasks with only a few hours of renting. Without being too careful using rental GPUs throughout will net you around $200 of a compute budget, but you can easily get this under $50 if you're willing to scale down many of the problems. I think we could work on making this clear and charting what these changes are.
If you have further feedback or encounter problems, feel free to open issues in the repos so we can resolve them! It's hard for us to fix issues we're not aware of.
Memory overuse: for context, it's about parallelism on gloo backend with CPU. My observation is that on Linux, the same (bad) python code will result in the process getting killed quickly, saving user the trouble of rebooting. Not sure if MacOS behavior is expected in the first place.
GPU cost: most of us will spend at least a few hours of troubleshooting to get started on a leased GPU, including but not limited to figuring out how much storage is needed, if CUDA version works well etc. No GPU is definitely possible but difficult. Plus, one issue might be that most of us just don't have enough experience working with them, resulting in more time figuring things out.
Github issues -- noted, will create any issue that I can think of.