Comment by BoorishBears

1 year ago

You're forgetting completion typically isn't binary.

Take juding response pairs for DPO for example, how do you ever prove someone used ChatGPT?

ChatGPT is good enough to decide in a way that will feel internally consistent, and even if you ask MTurk users to provide their logic, ChatGPT can produce a convincing response. Eventually you're forced to start measuring noisy 2nd and 3rd order signals like "did the writing in their rationale sound like ChatGPT?"

And what's especially tough is that this affects hard to verify tasks disproportionately, while those are exactly the kinds of tasks you'd generally want MTurk for.

1 comment

BoorishBears

makeitdouble 1 year ago

Yes, a very good point.

> And what's especially tough is that this affects hard to verify tasks disproportionately, while those are exactly the kinds of tasks you'd generally want MTurk for.

That's where I'd see MT just shutting down as being a very real possibility. If fraud management and consumers leaving the platform because of too much junk or unverifiable results, the whole concept could just fall apart from a business standing point.

We saw the same phenomenon I think with earlier "get paid to navigate the web" kind of scheme way back in the days, with a watch process monitoring the user actions on the computer and paying by the hour. Very quickly people found new ways to fake activity and game the system, and it all just shut down.