← Back to context

Comment by danielhanchen

17 hours ago

Oh I didn't expect this to be on HN haha - but yes for our new benchmarks for Qwen3.5, we devised a slightly different approach for quantization which we plan to roll out to all new models from now on!

Can you describe what is this slightly different approach and why it should work on all models?

Nice! Your stuff ran LLMs extremely well on < $500 boxes (24-32GB ram) with iGPUS before this update.

I’m eager to try it out, especially if 16GB is viable now.

  • The 5080 is 16GB VRAM, not system memory. I don't think you can get 24-32GB VRAM in a $500 box