Comment by thot_experiment
1 month ago
Maybe a skill issue but they both feel about the same and the MoE is 3x faster so I barely use the dense model.
1 month ago
Maybe a skill issue but they both feel about the same and the MoE is 3x faster so I barely use the dense model.
Not the person asked but on a medium bug that would span a few python files, I found the MOE be too enthusiastic trying things without trying to understand first the issue, when the dense model though hard and added debug statements to understand how to fix it. But the dense model is quite slow (Q4KM quant, MI50 32GB, llama.cpp, pi)