Comment by KaiserPro

5 days ago

> There is no way these people have the resources to train a fully fledged LLM, so claiming that is their goal makes me think they don't intend for the LLM to be useful.

Depends on what they are doing and why. but at most big labs, only the final model training happens on the big clusters. a lot of experimentation happens on <500 gpus per dev.

So for fast iteration, this seems fine.

This is the use case for the small NVIDIA boxes that a researcher can have on their desk for $5k and do useful experiments before spending all the grant money on a huge training run for the final product.

  • yes, they almost certainly can.

    but that only gets you so far, you need bigger multi-GPU setup to do the higher dimension stuff. You can use a DGX, but again thats limiting up to a certain point.