Comment by yorwba
11 hours ago
If you give the model access to specialized tools (e.g. web search for question answering) the knowledge doesn't have to be stored in the model weights, which leaves some room for improvement. You'd still be overfitting to benchmarks (since different tasks might require different tools) but not necessarily to specific benchmark questions, so within-domain generalization could be quite good.
As an example for a similar approach, Teapot AI has trained very small models https://teapotai.com/models to only answer questions where the answer can be found within the context window, and although not perfect, they do quite well at this compared to larger, more general models.
good point I have the feeling larger models (20b+) rely too much about their stored knowledge and sometimes fail to use tools because they think they know the answer. smaller specialized tool calling models could be the smart route for the future