Comment by pbronez

11 hours ago

Rapid MLX team has done some interesting benchmarking that suggests Qwopus 27B is pretty solid. Their tool includes benchmarking features so you can evaluate your own setup.

They have a metric called Model-Harness Index:

MHI = 0.50 × ToolCalling + 0.30 × HumanEval + 0.20 × MMLU (scale 0-100)

https://github.com/raullenchai/Rapid-MLX

Pardon the silly question, but why do I need this tool versus running the model directly (and SSH’ing in when I’m away from home)?