← Back to context

Comment by tantalor

1 month ago

The human baseline seems flawed.

1. There is no initial screening that would filter out garbage responses. For example, users who just pick the first answer.

2. They don't ask for reasoning/rationale.

I agree. I wonder what the human baseline is for ”what is 1 + 1” on Rapidata.

  • We try a bit harder than that my friend.

    • I actually didn't mean to criticize Rapidata. I just think that a forced-choice question like this begs for low-effort answers. At least the respondents should have had the opportunity to explain their reasoning, like the LLMs did.

      1 reply →