Comment by montroser

4 hours ago

Result is ~12 tokens per second, as reported by OP down in these comments here.

An impressive effort, and better than I would have thought possible on this hardware -- but still pretty far short of what one needs for an satisfactory interactive session.

3 comments

montroser

andix 4 hours ago

Especially if you consider those smaller models are really cheap and fast on platforms like openrouter. Often by the factor 100-500 cheaper than SOTA models, and 2-5x in TPS.

causal 2 hours ago

Yeah took way too long to find that result. Being able to run on slow RAM isn't surprising considering you can run a model off an SSD.

greenavocado 1 hour ago

I was about to ask that