Comment by zozbot234

6 years ago

> Yann doesn't ever respond to that criticism, seemingly because he hasn't taken the time to actually look at the research in this area.

No, that's still a problem with data in a broader sense. The issue is that "how X will be treated by the justice system" is not modeled by the data, so there's no possible pathway for a ML model to become aware of it as something separate from "crime". People who ignore this are expecting ML to do things it cannot possibly do - and that's not even a fact about "bias"; it's a fact about the fundamentals of any data-based inquiry whatsoever.

2 comments

zozbot234

joshuamorton 6 years ago

I hope you read to the end of my post where I address that:

> So insofar as data is the root of the problem, yes. Insofar as the solution is to just use more representative data in the same systems, no. That doesn't fix things.

Ultimately Yann's proposals are still to use "better data" whereas all the ethics people are (and have been) screaming no, you can't use better data because it doesn't exist. He doesn't acknowledge that.

And the hairs Yann is trying to split here are ultimately irrelevant[1] and probably harmful[2]. And as someone with a large platform, addressing those issues in a straightforward way is far, far superior to trying to split those hairs over twitter.

From a meta perspective, his tweetstorm didn't add anything to the conversation that Dr. Gebru and her collaborators aren't already aware of. Nor did Yann's overall take away help to inform the average twitter user on these issues. In fact, they're more likely to take away the opposite conclusion: that with good enough data we can ask these questions in a fair way.

But as you rightly conclude there are flaws in any data based inquiry. Yann doesn't concede that.

[1]: https://twitter.com/isbellHFh/status/1275184863159685121

[2]: https://twitter.com/hardmaru/status/1275088134238162944

zozbot234 6 years ago

I'm not sure that Yann was trying to split hairs there. He was reasoning about the issue from first principles (e.g. the problem-domain vs. architecture vs. data distinction) and then failing to carry his reasoning thru to the reasonable conclusion that you mention re: the inherent flaws of any data-based modeling. Criticizing his take wrt. these issues is constructive; being careless about what his actual views are is not.