← Back to context

Comment by tayo42

7 months ago

I'm naive on this topic but I would think they would do something like detect what the questions are about the load a relevant prompt instead of putting everything in like that?

> I'm naive on this topic but I would think they would do something like detect what the questions are about the load a relevant prompt instead of putting everything in like that?

So you think there should be a completely different AI model (or maybe the same model) with its own system prompt, that gets the requests, analyzes it, and chooses a system prompt to use to respond to it, and then runs the main model (which may be the same model) with the chosen prompt to respond to it, adding at least one round trip to every request?

You'd have to have a very effective prompt selection or generation prompt to make that worthwhile.

  • Not sure why you emphasizing a round trip request like these models aren't already taking a few seconds to respond? Not even sure that matters since these all run in the same datacenter, or you can atleast send requests to somewhere close.

    I'd probably reach for like embeddings though to find a relevant prompt info to include

    • > I'd probably reach for like embeddings though to find a relevant prompt info to include

      So, tool selection, instead of being dependent on the ability of the model given the information in context, is dependent on both the accuracy of a RAG-like context stuffing first and then the model doing the right thing given the context.

      I can't imagine that the number of input prompt tokens you save doing that is going to ever warrant the output quality cost of reaching for a RAG-like workaround (and the size of the context window is such that you shouldn't have the probems RAG-like workarounds mitigate very often anyway, and because the system prompt, long as it is, is very small compared to the context window, you have a very narrow band where shaving anything off the system prompt is going to meaningfully mitigate context pressure even if you have it.)

      I can see something like that being a useful approach with a model with a smaller useful context window in a toolchain doing a more narrowly scoped set of tasks, where the set of situations it needs to handle is more constrained and so identify which function bucket a request fits in and what prompt best suits it is easy, and where a smaller focussed prompt is a bigger win compared to a big-window model like GPT-5.

      1 reply →

Router models exist, and do something like what you describe. They run one model to make a routing decision, and then feed the request to a matching model, and return its result. They're not popular, because they add latency, cost, and variance/nondeterminism. This is all hearsay, mind you.