Comment by c0brac0bra

1 year ago

I've wondered about this as well. Is there a way to have a small, efficient LLM model that can estimate general task complexity without actually running the full task workload?

Scoring complexity on a gradient would let you know you need to send a "Sure, one second let me look that up for you." instead of waiting for a long round trip.

For sure: in fact MoE models train such a router directly, and the routers are not super large. But it would also be easy to run phi-3 against a request.

I almost think you could do like a check my work style response: ‘I’m pretty sure xx, .. wait, actually y.’ Or if you were right, ‘yep that’s correct. I just checked.’

There’s time in there to do the check and to get the large model to bridge the first sentence with the final response.