Comment by charcircuit
4 hours ago
The model influences how many tokens it uses for a problem. As an extreme example if it wanted it could fill up the entire context each time just to make you pay more. The efficiency that model can answer without generating a ton of tokens influences the price you will be spending on inference.
No comments yet
Contribute on Hacker News ↗