← Back to context

Comment by zettabomb

21 hours ago

SOTA performance is relative to model size. If it performs better than other models in the 70B range (e.g. Llama 3.3) then it could be quite useful. Not everyone has the VRAM to run the full fat Deepseek R1.

also isn't DeepSeek's Mixture of Experts? meaning not all params get ever activated on one forward pass?

70B feels like the best balance between usable locally and decent for regular use.

maybe not SOTA, but a great first step.