Comment by conception
1 month ago
https://crfm.stanford.edu/helm/air-bench/latest/#/leaderboar...
This isn’t the gotcha question you think it is. AI safety is being defined and measured.
1 month ago
https://crfm.stanford.edu/helm/air-bench/latest/#/leaderboar...
This isn’t the gotcha question you think it is. AI safety is being defined and measured.
Cool, another metric to game like they do the other ones.