Comment by bjconlan
15 hours ago
This is great! I feel the same way about the deepseek v4 architecture for commodity hardware.
Also have enjoyed playing with https://huggingface.co/HuggingFaceTB/nanowhale-100m-base (but early days for me understanding this space)
Very cool! I had no idea that HF was doing this - I really love their small model experiments.