Comment by pbronez
12 hours ago
Rapid MLX team has done some interesting benchmarking that suggests Qwopus 27B is pretty solid. Their tool includes benchmarking features so you can evaluate your own setup.
They have a metric called Model-Harness Index:
MHI = 0.50 × ToolCalling + 0.30 × HumanEval + 0.20 × MMLU (scale 0-100)
Pardon the silly question, but why do I need this tool versus running the model directly (and SSH’ing in when I’m away from home)?