Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library

Comment by naasking

3 days ago

Thanks for the link, the GSM8K result actually leads the pack in that table, but math is indeed underwhelming. Qwen 2.5 is in the lead, but bitnet isn't far behind and it takes 1/6th as much memory during inference, and was trained on less than 1/4 the number of tokens. Pretty cool.

0 comments

naasking

Reply

No comments yet

Contribute on Hacker News ↗

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities