Comment by conception
18 days ago
https://crfm.stanford.edu/helm/air-bench/latest/#/leaderboar...
This isn’t the gotcha question you think it is. AI safety is being defined and measured.
18 days ago
https://crfm.stanford.edu/helm/air-bench/latest/#/leaderboar...
This isn’t the gotcha question you think it is. AI safety is being defined and measured.
Cool, another metric to game like they do the other ones.