Comment by jstummbillig
2 months ago
What is the reason? This is a stress test. You can tell that by reading the first sentence of the article: "We stress-tested 16 leading models from multiple developers". In a stress test you want things to fail, otherwise you have learned very little about the stress the thing you are testing can take.
For physical things that has early limitations, but not for software. I would be very confused to see an AI stress test that did not end in failure, and would always question the test instead of thinking "wow, that must mean the thing is ready for autonomous action!"
Destructive stress testing is done on materials not "people"
Apparently it's now done on "people" (which, in a very important distinction, are not actual people but software)