Comment by jimmypk

7 hours ago

The BM25-first routing bet is interesting. You mention 85% recall@20 on 500 artifacts, but the heuristic classifier routing "short lookups to BM25 and narrative queries to cited-answer" raises a practical question: what does the classifier key on to decide a query is narrative vs short? Token count? Syntactic structure? The reason I ask is that in agent-generated queries, the boundary is often blurry - an agent doing a dependency lookup might issue a surprisingly long, well-formed sentence. If the classifier routes those to the more expensive cited-answer loop it could negate the latency advantage of BM25 being first.

Re classifier routing: text-shape signals (token count, syntactic markers) underspecify the boundary, especially for agent-generated queries. The signal that worked better in our policy-gated tool-call setting was the surrounding intent context the agent was operating under, not the query string itself. An agent in a "fact-check" context emits long, well-formed sentences that actually want exact-match retrieval; an agent in an "open research" context emits surprisingly short queries that need narrative retrieval. If the runtime can read the tool or skill context at query time, routing on that is less ambiguous than text shape. Doesn't help if the wiki is a black-box MCP server with no caller-side context, but it's worth offering an optional context hint in the lookup payload.