← Back to context

Comment by johnbarron

4 hours ago

>> Why would I do that?

I agree you dont owe anyone a reproduction, but also you dont owe anyone an effort to discredit the study and you did it.

>> I don't think I need to spend more time on this than I have.

How pious of you. I am still looking into the credibility of the study. It will take me more than 25 min...but I am really looking forward to see what this means for this 10 trillion industry.

I can however notice you had enough urgency to publicly critique the study within 25 minutes, and your comments carry weight, but when asked about checking whether the headline result actually holds, the answer is “why would I?”

I've seen enough of this study to be confident in warning people not to take it at face value.

The headline result definitely does not hold, given that the task involves many questions that cannot be answered but there's no option for "cannot be answered" - so models are forced to reply effectively at random.

I don't think this study is good enough that I should amplify it on my own blog, or bad enough that I should criticize it in a venue any more prominent than some Hacker News comments.