Comment by naasking
12 hours ago
I'm curious if frontier labs use any forms of compression on their models to improve performance. The small % drop of Q8 or FP8 would still put it ahead of Opus, but should double token throughput. Maybe then interactive use would feel like an improvement.
No comments yet
Contribute on Hacker News ↗