Comment by throw310822

6 months ago

Are these small models are trained to privilege "raw intelligence" over factual knowledge? Is there any indication of how much of current model is dedicated to the knowledge of multiple languages and tons of facts rather than pure understanding and reasoning?

1 comment

throw310822

canyon289 6 months ago

The evaluations provide this indication. You'll see MMLU, GPQA, Big Bench etc in reports for many models. Those numbers provide the indication you're looking for.

To answer a question you didn't ask. With small models especially we need to make choices as to which to focus on. For this model we focused on text summarization and instruction following, with the idea that users would finetune to gain performance on the task set that is relevant to them