← Back to context

Comment by marcelroed

1 hour ago

TA here. Noted! I now have more resources to test more environments, and will do so whenever possible. I think freezing due to memory overuse is going to be a problem with anything you code yourself, but I do think we could be more rigorous with guiding people to achieve limited memory use for the tokenizer task.

IMO the cost of renting GPUs is a bit overstated in these comments. Generally almost all of the development can be done locally, and then ran for a short period of time using on-demand GPUs. For assignment 1, you can run everything on your local machine, even if you don't have a GPU. For A1 and A2, you can do (most of) the tasks with only a few hours of renting. Without being too careful using rental GPUs throughout will net you around $200 of a compute budget, but you can easily get this under $50 if you're willing to scale down many of the problems. I think we could work on making this clear and charting what these changes are.

If you have further feedback or encounter problems, feel free to open issues in the repos so we can resolve them! It's hard for us to fix issues we're not aware of.

Memory overuse: for context, it's about parallelism on gloo backend with CPU. My observation is that on Linux, the same (bad) python code will result in the process getting killed quickly, saving user the trouble of rebooting. Not sure if MacOS behavior is expected in the first place.

GPU cost: most of us will spend at least a few hours of troubleshooting to get started on a leased GPU, including but not limited to figuring out how much storage is needed, if CUDA version works well etc. No GPU is definitely possible but difficult. Plus, one issue might be that most of us just don't have enough experience working with them, resulting in more time figuring things out.

Github issues -- noted, will create any issue that I can think of.