Comment by beacon294
1 hour ago
I find that fp8 cache can be pretty bad in vllm but works fine in llama.cpp. I don't know why, but I plan to review the implementations.
1 hour ago
I find that fp8 cache can be pretty bad in vllm but works fine in llama.cpp. I don't know why, but I plan to review the implementations.
No comments yet
Contribute on Hacker News ↗