Comment by montroser

4 hours ago

Result is ~12 tokens per second, as reported by OP down in these comments here.

An impressive effort, and better than I would have thought possible on this hardware -- but still pretty far short of what one needs for an satisfactory interactive session.

Especially if you consider those smaller models are really cheap and fast on platforms like openrouter. Often by the factor 100-500 cheaper than SOTA models, and 2-5x in TPS.

Yeah took way too long to find that result. Being able to run on slow RAM isn't surprising considering you can run a model off an SSD.