Comment by iugtmkbdfil834
1 day ago
Kudos for trying and I think it is a great start. Part of the issue is still that individual models differ greatly ( especially local ones ) in terms of what they can do ( and do well ). The problem is that you want some more custom tags ( ideally created by users who want to contribute to tag's accuracy ) 'can it generate csv', 'can it follow schema', 'can it offer position on $conversy_Z'.. none of these will be obvious, but will relate to real use cases.
We go back to the question of 'what does best actually mean'.
Thanks. Completely agree that it would be great to have more fine grained tags. How can we add such tags credibly from users without risk of them gaming the system? Maybe we can aggregate across more diverse leaderboards (lmarena,vals ai, etc. and the long tail of niche leaderboards)?