Comment by embedding-shape

18 minutes ago

Definitely. Usually I'd wait 2-3 weeks for the ecosystem to catch up and iron out the kinks, or do what I did for GPT-OSS, fix it in the places where it's broken, then judge it when I'm sure it's actually used correctly.

Otherwise, in that early period of time, only use the provided scripts/tools from the people releasing the model itself, which is probably the only way in those 2-3 weeks to be sure you're actually getting the expected responses.