Comment by c0rruptbytes
15 days ago
been testing forge with ternary bonsai 8b mlx 2bit, pretty sweet even if the model is limited - real potential with this project, good luck!!
- Broad slice:
- Full Forge: 48/72 accurate, 72/72 complete, score 66.7%
- Bare: 18/72 accurate, 24/72 complete, score 25.0%
- Lift: +30 correct runs, no paired regressions
- Bare had 42 ToolCallErrors and 6 ToolExecutionErrors; full Forge had none.
- Advanced reasoning:
- Full Forge: 3/24 accurate, 24/24 complete, score 12.5%
- Bare: 3/24 accurate, 9/24 complete, score 12.5%
- Lift: completion improved, but accuracy did not.
No comments yet
Contribute on Hacker News ↗