Comment by alecco

2 months ago

1. /r/localllama unanimously doesn't like the Spark for running models

2. and for CUDA dev it's not worth the crazy price when you can dev on a cheap RTX and then rent a GH or GB server for a couple of days if you need to adjust compatibility and scaling.

15 comments

alecco

BadBadJellyBean 2 months ago

I am not on reddit. What are they saying?

mapontosevenths 2 months ago
It isn't for "running models." Inference workloads like that are faster on a mac studio, if that's the goal. Apple has faster memory.
These devices are for AI R&D. If you need to build models or fine tune them locally they're great.
That said, I run GPT-OSS 120B on mine and it's 'fine'. I spend some time waiting on it, but the fact that I can run such a large model locally at a "reasonable" speed is still kind of impressive to me.
It's REALLY fast for diffusion as well. If you're into image/video generation it's kind of awesome. All that compute really shines when for workloads that aren't memory speed bound.
- lostmsu 2 months ago
  
  With a 5070 Ti performance that's a weird choice for R&D as well. You won't be able to train models that require anywhere near 100GB VRAM due to slow processing, and 5070 Ti is under $1k
  
  8 replies →
- nickthegreek 2 months ago
  
  what workflow/models are you using for media generation?

mi_lk 2 months ago

What’s GH and GB server?

alecco 2 months ago

Grace-Hopper and Grace-Blackwell. "Grace" is the integrated CPU+GPU architecture. DGX Spark is GB10 and it's allegedly like a small version of the server GB200.
saagarjha 2 months ago

GH200/GB200, Nvidia’s server hardware