Sorry, what I meant is if third party has them in their leaderboards. I don't usually trust most of what any of these vendors claim in their release notes without a third party. I know it says "verified" there, but I don't see were the SWE bench results are from a third party, whereas for the "HLE-Verified" they do have a citation to Hugging Face.
It's in the post?
Sorry, what I meant is if third party has them in their leaderboards. I don't usually trust most of what any of these vendors claim in their release notes without a third party. I know it says "verified" there, but I don't see were the SWE bench results are from a third party, whereas for the "HLE-Verified" they do have a citation to Hugging Face.
I was looking for something closer to: https://www.vals.ai/benchmarks/swebench
"SWE-Bench Verified" is the name of the benchmark: https://dev.to/duplys/swe-bench-swe-bench-verified-benchmark.... Same with "HLE-Verified". It's nothing to do with third party testing. The citation you point to makes that clear.