Comment by aseligman
21 hours ago
Some additional context: many real world agent use cases struggle to balance quality, cost, and performance. This technique can help avoid the tradeoffs that quantization techniques introduce, including unpredictable results while you try cost optimize an agent. In some cases the cost savings can be significant using dfloat11 as you squeeze into more affordable GPUs.
* I work with xmad.ai
No comments yet
Contribute on Hacker News ↗