Comment by cybertim

1 day ago

I bought two 3080/20gb and one of those MACHINIST X99 mainboards as well (one with two full x16 pcie slots) those boards come with a xeon cpu included (for the pcie lane support) it set me back 800 euros total (had a spare psu, ssd and mem in a drawer) and now im also happily running 80tk/s Qwen 3.6 Q8 (MTP).

Good call, I really hesitated between the X570 and the X99, are you using P2P?

  • $ nvidia-smi topo -p2p r

    GPU0 GPU1

    GPU0 X CNS

    GPU1 CNS X

    i guess not, i use llama.cpp with:

    --spec-draft-n-max 3 --spec-type draft-mtp --split-mode tensor --tensor-split 1,1

    and my (gen) tk/s are between 60-80 tk/s

    will test this uncensored model and ngram added as well this weekend

    btw, i also set my powerlimit to 220watt per card (with nvidia-smi) that will cost you around 1 tk/s but safe you a LOT of power and heat :)

    • CNS means Chipset not supported and I doubt it is the case, are you sure you are using the patched nvidia module? modinfo nvidia to check which one is loaded

      1 reply →