Comment by gliptic

1 year ago

Your best bet for 33B is already having a computer and buying a used RTX 3090 for <$1k. I don't think there's currently any cheap options for 70B that would give you >5. High memory bandwidth is just too expensive. Strix Halo might give you >5 once it comes out, but will probably be significantly more than $1k for 64 GB RAM.

11 comments

gliptic

ants_everywhere 1 year ago

With used GPUs do you have to be concerned that they're close to EOL due to high utilization in a Bitcoin or AI rig?

gliptic 1 year ago
I guess it will be a bigger issue the longer it's been since they stopped making them, but most I've heard (including me) haven't had any issue. Crypto rigs don't necessarily break GPUs faster because they care about power consumption and run the cards at a pretty even temperature. What probably breaks first is the fans. You might also have to open the card up and repaste/repad them to keep the cooling under control.
- ants_everywhere 1 year ago
  
  awesome thanks!
EVa5I7bHFq9mnYK 1 year ago

GPUs were last used for Bitcoin mining in 2013, so you shouldn't be concerned unless you are buying a GTX 780.

pmarreck 1 year ago

M4 Mac with unified GPU RAM

Not very cheap though! But you get a quite usable personal computer with it...

gliptic 1 year ago

Any that can run 70B at >5 t/s are >$2k as far as I know.

jjallen 1 year ago

How does inference happen on a GPU with such limited memory compared with the full requirements of the model? This is something I’ve been wondering for a while

Gracana 1 year ago
You can run a quantized version of the model to reduce the memory requirements, and you can do partial offload, where some of the model is on GPU and some is on CPU. If you are running a 70B Q4, that’s 40-ish GB including some context cache, and you can offload at least half onto a 3090, which will run its portion of the load very fast. It makes a huge difference even if you can’t fit every layer on the GPU.
- jjallen 1 year ago
  
  So the more GPUs we have the faster it will be and we don't have to have the model run solely CPU or GPU -- it can be combined. Very cool. Think that is how it's running now with my single 4090.

ynniv 1 year ago

Umm, two 3090's? Additional cards scale as long as you have enough PCIe channels.

gliptic 1 year ago

I arbitrarily chose $1k as the "cheap" cut-off. Two 3090 is definitely the most bang for the buck if you can fit them.