Comment by tantalor

3 months ago

The human baseline seems flawed.

1. There is no initial screening that would filter out garbage responses. For example, users who just pick the first answer.

2. They don't ask for reasoning/rationale.

8 comments

tantalor

slongfield 3 months ago

My favorite example of this was the Pew Research study: https://www.pewresearch.org/short-reads/2024/03/05/online-op...

They found that ~15% of US adults under 30 claim to have been trained to operate a nuclear submarine.

mwigdahl 3 months ago

Lizardman's Constant is famously 4%. https://en.wikipedia.org/wiki/Slate_Star_Codex#Lizardman's_C...

felix089 3 months ago

RE 1, they actually do have a pre-screening screening of the participants in general, you can check how they do it in detail: https://www.rapidata.ai/

tantalor 3 months ago

Ah, that's good to hear. I didn't see anything like that in the data dump so I assumed they don't do that. Glad to be corrected.

andreasgl 3 months ago

I agree. I wonder what the human baseline is for ”what is 1 + 1” on Rapidata.

rapidata 3 months ago
We try a bit harder than that my friend.
- andreasgl 3 months ago
  
  I actually didn't mean to criticize Rapidata. I just think that a forced-choice question like this begs for low-effort answers. At least the respondents should have had the opportunity to explain their reasoning, like the LLMs did.
  
  1 reply →