← Back to context

Comment by alecco

5 days ago

1. /r/localllama unanimously doesn't like the Spark for running models

2. and for CUDA dev it's not worth the crazy price when you can dev on a cheap RTX and then rent a GH or GB server for a couple of days if you need to adjust compatibility and scaling.

I am not on reddit. What are they saying?

  • It isn't for "running models." Inference workloads like that are faster on a mac studio, if that's the goal. Apple has faster memory.

    These devices are for AI R&D. If you need to build models or fine tune them locally they're great.

    That said, I run GPT-OSS 120B on mine and it's 'fine'. I spend some time waiting on it, but the fact that I can run such a large model locally at a "reasonable" speed is still kind of impressive to me.

    It's REALLY fast for diffusion as well. If you're into image/video generation it's kind of awesome. All that compute really shines when for workloads that aren't memory speed bound.

    • With a 5070 Ti performance that's a weird choice for R&D as well. You won't be able to train models that require anywhere near 100GB VRAM due to slow processing, and 5070 Ti is under $1k

      8 replies →