Comment by regularfry 8 days ago They're claiming 20+tps inference on a macbook with the unsloth quant. 1 comment regularfry Reply embedding-shape 7 days ago Yeah, I'm guessing the Mac users still aren't very fond of sharing the time the prefill takes, still. They usually only share the tok/s output, never the input.
embedding-shape 7 days ago Yeah, I'm guessing the Mac users still aren't very fond of sharing the time the prefill takes, still. They usually only share the tok/s output, never the input.
Yeah, I'm guessing the Mac users still aren't very fond of sharing the time the prefill takes, still. They usually only share the tok/s output, never the input.