← Back to context

Comment by nickpsecurity

1 day ago

I've been collecting papers on straining models on small numbers of GPU's. What I look for is (a) type of GPU, (b) how many, and (c) how long it ran. I can quickly establish a minimum cost from that.

I say minimum because there's pre-processing data, setting up the machine configuration, trial runs on small data to make sure it's working, repeats during the main run if failures happened, and any time to load or offload data (eg checkpoints) from the GPU instance. So, the numbers in the papers are a nice minimum rather than the actual cost of a replication which is highly specific to one's circumstances.

Sure... but they provide all that too. They're just saving most people extra work. And honestly, I think it is nice to have a historical record. There's plenty of times I'm looking in papers for numbers that don't seem relevant at the time but are later. Doesn't hurt.