Comment by iugtmkbdfil834

1 day ago

Kudos for trying and I think it is a great start. Part of the issue is still that individual models differ greatly ( especially local ones ) in terms of what they can do ( and do well ). The problem is that you want some more custom tags ( ideally created by users who want to contribute to tag's accuracy ) 'can it generate csv', 'can it follow schema', 'can it offer position on $conversy_Z'.. none of these will be obvious, but will relate to real use cases.

We go back to the question of 'what does best actually mean'.

1 comment

iugtmkbdfil834

rcanand2025 16 hours ago

Thanks. Completely agree that it would be great to have more fine grained tags. How can we add such tags credibly from users without risk of them gaming the system? Maybe we can aggregate across more diverse leaderboards (lmarena,vals ai, etc. and the long tail of niche leaderboards)?