Comment by mongrelion
8 hours ago
Which quantization are you running and what context size? 32tok/s for that model on that card sounds pretty good to me!
8 hours ago
Which quantization are you running and what context size? 32tok/s for that model on that card sounds pretty good to me!
No comments yet
Contribute on Hacker News ↗