Comment by h14h

11 hours ago

Given how widely varying the amount of tokens each model uses for a given task, "price-per-token" is essentially meaningless when doing this sort of comparison.

Artificial Analysis's "Cost to run" model (aka num_tokens_used * price_per_token) is much better, but even that is likely problematic since it's not clear whether running a bunch of benchmarks maps cleanly to real-world token use.