Comment by sailingparrot
7 hours ago
> Nvidia has been using its newfound liquid funds to train its own family of models
Nvidia has always had its own family of models, it's nothing new and not something you should read too much into IMHO. They use those as template other people can leverage and they are of course optimized for Nvidia hardware.
Nvidia has been training models in the Megatron family as well as many others since at least 2019 which was used as blueprint by many players. [1]
Nemotron-3-Nano-30B-A3B[0][1] is a very impressive local model. It is good with tool calling and works great with llama.cpp/Visual Studio Code/Roo Code for local development.
It doesn't get a ton of attention on /r/LocalLLaMA but it is worth trying out, even if you have a relatively modest machine.
[0] https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B...
[1] https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF
Some of NVIDIA's models also tend to have interesting architectures. For example, usage of the MAMBA architecture instead of purely transformers: https://developer.nvidia.com/blog/inside-nvidia-nemotron-3-t...
Deep SSMs, including the entire S4 to Mamba saga, are a very interesting alternative to transformers. In some of my genomics use cases, Mamba has been easier to train and scale over large context windows, compared to transformers.
It was good for like, one month. Qwen3 30b dominated for half a year before that, and GLM-4.7 Flash 30b took over the crown soon after Nemotron 3 Nano came out. There was basically no time period for it to shine.
It is still good, even if not the new hotness. But I understand your point.
It isn't as though GLM-4.7 Flash is significantly better, and honestly, I have had poor experiences with it (and yes, always the latest llama.cpp and the updated GGUFs).
Genuinely exciting to be around for this. Reminds me of the time when computers were said to be obsolete by the time you drove them home.
I recently tried GLM-4.7 Flash 30b and didn’t have a good experience with it at all.
1 reply →
I find the Q8 runs a bit more than twice as fast as gpt-120b since I don’t have to offload as many MoE layers, but is just about as capable if not better.
Nemo is different to Megatron.
Megatron was a research project.
NVidia has professional services selling companies on using Nemo for user facing applications.