Comment by zettabomb
21 hours ago
SOTA performance is relative to model size. If it performs better than other models in the 70B range (e.g. Llama 3.3) then it could be quite useful. Not everyone has the VRAM to run the full fat Deepseek R1.
21 hours ago
SOTA performance is relative to model size. If it performs better than other models in the 70B range (e.g. Llama 3.3) then it could be quite useful. Not everyone has the VRAM to run the full fat Deepseek R1.
also isn't DeepSeek's Mixture of Experts? meaning not all params get ever activated on one forward pass?
70B feels like the best balance between usable locally and decent for regular use.
maybe not SOTA, but a great first step.