Comment by danenania

20 hours ago

Model routing is deceptively hard though. It has halting problem characteristics: often only the smartest model is smart enough to accurately determine a task's difficulty. And if you need the smartest model to reliably classify the prompt, it's cheaper to just let it handle the prompt directly.

This is why model pickers persist despite no one liking them.

11 comments

danenania

martinald 20 hours ago

Yes but prompt evaluation is far faster than inference as it can be done (mostly) in parallel, so I don't think that's true.

danenania 20 hours ago
The problem is that input token cost dominates output token cost for the majority of tasks.
Once you've given the model your prompt and are reading the first output token for classification, you've already paid most of the cost of just prompting it directly.
That said, there could definitely be exceptions for short prompts where output costs dominate input costs. But these aren't usually the interesting use cases.
- energy123 19 hours ago
  
  No, you're talking about costs to user, which are oversimplifications of the costs that providers bear. One output token with a million input tokens is incredibly cheap for providers
  
  3 replies →
- redox99 20 hours ago
  
  That's usually not the case for thinking models. And usually hard problems have a very short prompt.
  
  2 replies →

Davidzheng 13 hours ago

but if the less strong model has low false positives you can just route them in order of strength

danenania 13 hours ago

That's a very big "if".