← Back to context Comment by vkaufmann 21 hours ago GPT-OSS-120B runs like hell on my DGX Spark 2 comments vkaufmann Reply embedding-shape 19 hours ago The MXFP4 variant I suppose? My setup (RTX Pro 6000) does around ~140 tok/s with llama.cpp, around 160 tok/s with vLLM. vkaufmann 18 hours ago yep MXFP4 really fast :D
embedding-shape 19 hours ago The MXFP4 variant I suppose? My setup (RTX Pro 6000) does around ~140 tok/s with llama.cpp, around 160 tok/s with vLLM. vkaufmann 18 hours ago yep MXFP4 really fast :D
The MXFP4 variant I suppose? My setup (RTX Pro 6000) does around ~140 tok/s with llama.cpp, around 160 tok/s with vLLM.
yep MXFP4 really fast :D