Comment by pbronez

12 hours ago

Rapid MLX team has done some interesting benchmarking that suggests Qwopus 27B is pretty solid. Their tool includes benchmarking features so you can evaluate your own setup.

They have a metric called Model-Harness Index:

MHI = 0.50 × ToolCalling + 0.30 × HumanEval + 0.20 × MMLU (scale 0-100)

https://github.com/raullenchai/Rapid-MLX

1 comment

pbronez

JumpCrisscross 12 hours ago

Pardon the silly question, but why do I need this tool versus running the model directly (and SSH’ing in when I’m away from home)?