Comment by danny_codes

14 hours ago

A million tokens is like 5 minutes of inference for heavy coding use.

At work I regularly hit my 7.5mil tokens per hour limit one of our tools has, and have to switch model of tool, and I’m not even really a remotely heavy user. I think people don’t realise how many tokens get burned with CoT and tool calls these days

At 7.5mil per hour hard limit, 84 days to hit the grandparents $3k

That said local models really are slow still, or fast enough and not that great

  • They already stated they can only generate 57,600 tokens per hour locally (expressed as 16 tokens per second). So that's the limiting factor here.