← Back to context

Comment by lelanthran

1 month ago

> It’s in the interest of the model builders to solve your problem as efficiently as possible token-wise. High value to user + lower compute costs = better pricing power and better margins overall.

It's only in the interests of the model builders to do that IFF the user can actually tell that the model is giving them the best value for a single dollar.

Right now you can't tell.

Why not? Seems like you'd just build the same app on each of the models you want to test and judge how they did.

  • > Why not? Seems like you'd just build the same app on each of the models you want to test and judge how they did.

    I tried that on a few problems; even on the same model the results have too much variation.

    When comparing different models, repeating the experiment gives you different results.

    • Interesting point about model variation. It would be useful to run multiple trials and look at the statistical distribution of results rather than single runs. This could help identify which models are more consistent in their outputs.

      1 reply →