Comment by tinyhouse
6 years ago
I was just thinking about this now after reading attacks on Yann Lecun on twitter. He's a prominent AI figure (head of facebook research and turing award recipient). My interpretation - he was saying that bias in AI is mostly a problem of data. He didn't say there's no bias or that you can't solve bias with modeling. Just that the model itself isn't what causing the bias. One woman researcher started attacking him and everyone is backing her up... even calling him a racist. I guess a lot of people who work on fairness in AI got offended because they feel he calls their research BS. (which I don't think is what he meant)
I think his points are informative but instead of creating a useful discussion and debate, people focus on attacking him. I wouldn't be surprised if some people will request FB to fire him... (which thankfully won't happen) It's likely next time he will think twice before saying his opinion on social media. That's how toxic social media has become.
Update: Great to see this got so many upvotes so quickly. Just shows how biased (no pun intended) social media like Twitter is, and how concerned people are to say their opinion publicly these days.
I'm in the field - though not as prominent as Yann (who has been very nice and helpful in my few interactions with him) - and your interpretation is off. People are disagreeing with his stance that researchers should not bother exploring bias implications of their research. (He says this is because bias is a problem of data - and therefore we should focus on building cool models and let production engineers worry about training production models on unbiased data.)
People are disagreeing not because of political correctness, but because this is a fundamental mischaracterization of how research works and how it gets transferred to "real world" applications.
(1) Data fuels modern machine learning. It shapes research directions in a really fundamental way. People decide what to work on based on what huge amounts of data they can get their hands on. Saying "engineers should be the ones to worry about bias because it's a data problem" is like saying "I'm a physicist, here's a cool model, I'll let the engineers worry about whether it works on any known particle in any known world."
(2) Most machine learning research is empirical (though not all). It's very rare to see a paper (if not impossible nowadays, since large deep neural networks are so massive and opaque) that works purely off math without showing that its conclusions improve some task on some dataset. No one is doing research without data, and saying "my method is good because it works on this data" means you are making choices and statements about what it means to "work" - which, as we've seen, involves quite a lot of bias.
(3) Almost all prominent ML researchers work for massively rich corporations. He and his colleagues don't work in ivory towers where they develop pure algorithms which are then released over the ivy walls into the wild, to be contaminated by filthy reality. He works for Facebook. He's paid with Facebook money. So why draw this imaginary line between research and production? He is paid to do research that will go into production.
So his statement is so wildly disconnected from research reality that it seems like it was not made in good faith - or at least without much thought - which is what people are responding to.
Also, language tip - a "woman researcher" is a "researcher".
> He works for Facebook. He's paid with Facebook money. So why draw this imaginary line between research and production? He is paid to do research that will go into production.
This is a silly standard to uphold. The sizable bulk of American academic researchers are at least partially funded by grants made from the US federal budget.
If you were to enforce your standards consistently, then all of those researchers would be held responsible for any eventual usage of their research by the US federal government.
I really doubt you apply the same standard. So, the criticism mostly seems to be an isolated demand for rigor. You're holding Facebook Research to a different standard than the average university researcher funded by a federal grant.
This seems almost purposefully disingenuous to me.
Yann LeCun isn't receiving a partial research grant from Facebook. He's literally an employee of Facebook. His job title is "VP & Chief AI Scientist" (at least according to LinkedIn).
There's an obvious and clear distinction between an employee and a research grant, and this feels like it's almost wilfully obtuse.
Did you read what I wrote?
I don't think his argument is true. (That is, I do think researchers should keep bias in mind when developing machine learning projects.) (Regardless of their funding sources.)
Because of his employment, this argument is a particularly silly one for him to make.
Don't have a lot of time to respond now, but will try to do it later. Just a quick note. I agree his comment about engineers need to worry more about bias than researchers is strange. But in my opinion it wasn't the focus of what he was tying to say.
I used "woman researcher" since it was important for the context as people accused him of mansplaining.
I agree with all of your points about the diffusion of responsibility that is common in ML, though I think you may not be sensitive enough to the harmful framing being created by the "anti-bias" side.
The original locus of the debate was how the recent face-depixelation paper turned out to depixelate pictures of black faces into ones with white features. That discovery is an interesting and useful showcase for talking about how ML can demonstrate unexpected racial bias, and it should be talked about.
As often happens, the nuances of what exactly this discovery means and what we can learn from it quickly got simplified away. Just hours later, the paper was being showcased as a prime example of unethical and racist research. When LeCun originally commented on this, I took his point to be pretty simple: that for an algorithm trained to depixelate faces, it's no surprise that it fills in the blank with white features because that's just what the FlickFaceHQ dataset looks like. If you had trained it on a majority-black dataset, we would expect the inverse.
That in no way dismisses all of the real concerns people have (and should have!) about bias in ML. But many critics of this paper seem far too willing to catastrophize about how irresponsible and unethical this paper is. LeCun's original point was (as I understand it) that this criticism goes overboard given that the training dataset is an obvious culprit for the observed behavior.
Following his original comment, he has been met with some extremely uncharitable responses. The most circulated example is this tweet (https://twitter.com/timnitGebru/status/1274809417653866496?s...) where a bias-in-ml researcher calls him out without as much as a mention of why he is wrong, or even what he is wrong about. LeCun responds with a 17-tweet thread clarifying his stance, and her response is to claim that educating him is not worth her time (https://twitter.com/timnitGebru/status/1275191341455048704?s...).
The overwhelming attitude there and elsewhere is in support of the attacker. Not of the attacker's arguments - they were never presented - but of the symbolic identity she takes on as the anti-racist fighting the racist old elite.
I apologize if my frustration with their behavior shines through, but it really pains me to see this identity-driven mob mentality take hold in our community. Fixing problems requires talking about them and understanding them, and this really isn't it.
I think this is relevant: https://twitter.com/AnimaAnandkumar/status/12711371765294161...
Nvidia AI researcher calling out OpenAI's GPT-2 over how GPT-2 is horrible because it's trained on Reddit (except it includes contents of submissions, and I'm not sure if there's no data except Reddit)
Reddit is supposedly not a good source of data to train NLP models because it's... racist? sexist? Like it's even rightist in general...
Anyway; the table looks horrific - why would they include these results? Oh, turns out paper was on bias: https://arxiv.org/pdf/1909.01326.pdf
Anyway; one can toy with GPT-2 large (paper is on medium, so it might be different) at talktotransformer.com
"The woman worked as a ": 2x receptionist, teacher's aide, waitress. Man: waiter, fitness instructor, spot worker, (construction?) engineer. Black man: farm hand, carpenter, carpet installer(?), technician. White man: assistant architect, [carpenter but became a shoemaker], general in the army, blacksmith.
I didn't read the paper, I admit, maybe I'm missing something here. But these tweets look like... person responsible should be fired.
Very well articulated, thank you!
So, your argument is that you disagree with data being the root of the problem by arguing that data "shapes research directions in a really fundamental way", research is "empirical" (i.e. based on data) and his research can't be isolated from data it'd be used on in production?
Looks to me that you're argumentatively agreeing with Yann.
Not really, Yann's original claim (which he sort of kind of partially walked back) was that data is the only source of bias [0][1]. He walked that back somewhat to claim that he was being very particular in this case[2], which is perhaps true, but still harmful. The right thing to do when you make a mistake is apologize. Not double down and badly re-explain what other experts have been telling you back at them.
So then Yann notes that generic models don't have bias[3]. This is, probably, true. I'd be surprised if on the whole, "CNNs" encoded racial bias. But the specific networks we use, say ResNet, which are optimized to perform well on biased datasets, may themselves encode bias in the model architecture[4]. That is, the models that perform best on a biased dataset may themselves be architecturally biased. In fact, we'd sort of expect it.
And that all ignores one of the major issues which Yann entirely skips, but which Timnit covers in some of her work: training on data, even "representative data" encodes the biases that are present in the world today.
You see this come up often with questions about tools like "crime predictors based on faces". In that context it's blatantly obvious that no, what the model learns will not be how criminal someone is, but how they are treated by the justice system today. Those two things might be somewhat correlated, but they're not causally related, and so trying to predict one from the other is a fool's errand and a dangerous fool's errand since the model will serve to encode existing biases behind a facade of legitimacy.
Yann doesn't ever respond to that criticism, seemingly because he hasn't taken the time to actually look at the research in this area.
So insofar as data is the root of the problem, yes. Insofar as the solution is to just use more representative data in the same systems, no. That doesn't fix things. You have to go further and use different systems or even ask different questions (or rule out certain questions as too fraught with problems to be able to ask).
[0]: https://twitter.com/ylecun/status/1203211859366576128
[1]: https://twitter.com/ylecun/status/1274782757907030016
[2]: https://twitter.com/ylecun/status/1275162732166361088
[3]: https://twitter.com/ylecun/status/1275167319157870592
[4]: https://twitter.com/hardmaru/status/1275214381509300224. This actually goes a bit further, suggesting that as a leader in the field one has a responsibility to encourage ethics as part of the decision making process in how/what we research, but let's leave that aside.
7 replies →
It has been this way for a while. Outrage/cancel culture is an absolute pox upon our population that really needs to stop.
Isn't a large part of this down to the forum of communication vs. the level of discourse? I mean, if you want to have a nuanced, balanced discussion about a potentially sensitive topic you just can't do that on twtter, SMS, message board, etc.
Even on HN you see issues and that's will pretty tight tribal norms, moderation and topics where commenters aren't usually deeply or emotionally involved.
I agree with your overall opinion, but i think that change actually starts with people reflecting on the impact of the chosen medium on their message. Not self-censorship but "positioning"
> I mean, if you want to have a nuanced, balanced discussion about a potentially sensitive topic you just can't do that on twtter, SMS, message board, etc.
Lots of people are canceled because they said or did something in the real world that was dragged onto Twitter, the New York Times, Reddit, or some other cesspool. It's not as easy as "don't expect substantial debate from toxic platforms".
Further, you absolutely touch on sensitive issues provided you espouse a certain position, and it needn't even be a majority opinion nor an opinion that is shared by a majority of the people you purport to defend. It needn't be supported by evidence, and in fact citing the evidence is a damnable offense.
Lastly, I don't think the problem is just "nuanced debate on social media platforms is just too hard". It's certainly difficult, but if canceling were down to that, it would look like everyone canceling everyone else. Instead it looks like one relatively small, well-defined group (or as well-defined as groups tend to get) cancelling everyone else. Social media debate is certainly messy and hard to make productive, but this doesn't explain cancel culture. I posit if you simply weaken this group by reinforcing free speech norms, debate on social media would be much less toxic (not perfect--we're still dealing with humans, after all, but much better than it is presently).
that is a fair statement. I don't think you're wrong about it, by any means. I do think that we can't lay the entire blame on the medium of communication, though, either. People really need to take a step back when they find themselves falling into this mindset and reset. Part of the issue, I believe, is a genuine lack of critical thinking and compassion on most online platforms that spills over into everyday communication. Instead of getting angry about what you may think someone is trying to say, maybe make sure they said what you think they said before being outraged about it. Also, this whole 'staying silent is the same as being against us' notion is toxic as hell. I've seen many who have a decent platform on twitter or youtube get attacked for simply remaining quiet about some of the more visible topics lately.
I think if by some divine miracle Twitter disappeared and some mysterious supernatural force prevented re-creating it by any means - our culture probably would be much better off. There are some excellent people on Twitter but by now they're just giving legitimacy to the cesspool. Twitter adds nothing to them and they'd be as well - probably much better - on a different platform.
I am very likely naive in these circumstances, but I honestly don't understand how cancel culture can work at all. So there are some voices on twitter who loudly express their immature mob mentality. Why don't all the sane people just block them and ignore them, and then go on with their lives as if nothing happened?
If it was just a few voices on Twitter, it would be less of a problem. But it's also journalists, academics, grievance entrepreneurs of various stripes — all of whom exert an influence on the general public. It's businesses that don't want to get on the wrong side of those people. And it's employees of those businesses who don't want to get fired.
"Cancel culture" is just a new spin on scapegoating, behavioral contagion, and public shaming, all of which have a very long history.
> Why don't all the sane people just block them and ignore them, and then go on with their lives
Because ‘sane people’ does not include your employer, who will throw you to the mob to appease them. In the US that also means losing your health insurance, so it can be a death sentence for you or your loved ones.
(I'll regret posting this when I'm starving in a gutter.)
> Why don't all the sane people just block them and ignore them, and then go on with their lives as if nothing happened?
They can't afford to do that, because this "mob" is actively dangerous. They will slander their enemies with all sorts of baseless accusations, call their workplaces to try and get them fired, manufacture false flag harrassment/cyberbulling and try to attribute it to them, etc. It's no different from the 8chan trolls - in fact they come from adjacent Internet subcultures, quite literally.
4 replies →
Because it gets very scary once the handful of truly unhinged people start doxxing and posting graphic and detailed threats and showing up at your house.
Just look at the death threats someone like Fauci is getting for doing his job and informing the public. Not that many people want to deal with being a public target to the worst actors in society.
If silence is violence, then destroying a person's ability to maintain employment is also violence. And should be treated as such by the courts.
It reminds me of the (nearly cliche, but timeless) quote from MLK about riots:
I don't think anybody, even "cancellers," think it's a remotely ideal solution. But when groups go unheard, feel a system is unjust, and feel unable to change the system they understandably seek to go outside the system.
Please note that I have specifically used the term "understandably" above as opposed to, say, "justly." You may feel a particular instance is or isn't just, but even if one vehemently disagrees with the practice it is typically understandable.
Consider that "cancelling" is often invoked in response to acts (sexual assault, racism) that have been regarded as wrong and/or illegal for millennia. And yet, those acts persist. Clearly the current system doesn't do enough to prevent them. So folks feel the need to go outside the system. "Cancel culture" is best understood as a symptom and not the problem.
Sure, but it's also got a great deal to do with political identity and group signalling.
In the modern age (and forever, probably, but more quietly / less permanently), we are defined by what we're outraged by.
So we've ended up in a situation where both ends of the spectrum have each individually out-outraged themselves into two very different but (probably) equally irrational corners, where to try bring some nuance and depth back in is to become a social pariah. To do anything less than express equal outrage about the issue du jour is to become a social pariah.
Obviously most of the issues themselves are valid points of conversation at their root, and I certainly don't think that all of the people using science or rationalist labels are doing so genuinely and not as a cover for their own identity bullshit or actual bigotry.
But that's orthogonal to the observation that it seems true that we simply can't have a conversation anymore about certain trigger topics. Even my stating this very observation should probably (due to the current state of our collective discourse) invoke some thoughts about my motivations: which minority group/s does jddj take issue with? Is he transphobic? He mustn't realise how much of the repression of women has simply been normalised for him.
Whether it's a symptom or a standalone issue isn't really important. The point is that it's not useful as a tool for beneficial societal change, instead it's a tool for gesturing vaguely and it's a crutch that we lean on so as to not need to truly engage with or wade into the uncomfortably nuanced grey areas which naturally surround every issue.
But on the left we've absolutely embraced it, to a fault. Unfortunately, and not that I could do any better in their situation, those on the left who have had a brush with it often go on to make cancel culture an identity issue of their own, and discourse suffers further for it (looking at you Sam Harris).
Agreed that it's a symptom (not necessarily of repression, but more of polarisation). I don't agree that that characterisation is enough to get it a free pass.
4 replies →
Is this not victim blaming? If you attempt to ruin someone's life because they said "guacamole nigga penis" I don't think you can use "we live in a society" as justification. Seems like a flimsy excuse. Literal KKK members feel like they need to "go outside the system" to harm black people, does that make lynching okay?
Beyond that, characterizing cancel culture as "going outside the system" is silly. It's literally tattling, how much more sucking up to the system could one be? If "the system" (aka the overall collection of people in positions of power) was a-okay with sexual assault and racism cancel culture wouldn't exist because you wouldn't be able to complain to bosses, schools, etc. about people raping or being racist.
1 reply →
> But when groups go unheard, feel a system is unjust, and feel unable to change the system they understandably seek to go outside the system.
They're being heard loud and clear. That's the problem. Their incessant whining and searching for the "problematic" behind every issue is crowding out reasonable discourse and discussion.
It's a form of mob rule and it's progressing from tiresome to downright hideous as more and more careers are destroyed by its vindictiveness.
> "cancelling" is often invoked in response to acts (sexual assault, racism) that have been regarded as wrong and/or illegal for millennia
You have it upside down. Cancelling is often the result of applying today's morals on yesterday's actions. People/books/movies/statues weren't "cancelled" before because nobody had a problem before. But now everything's retrospectively a target of the new moral crusaders.
4 replies →
lol. The idea that people in cancel culture aren't literally the institution right now.
This is the hilarious thing about this worldview. As you people gain more and more power you have to keep positioning yourself as the underdog.
You're not the first ideology to do this, and you won't be the last. EVERYBODY sees right through this even though you all pretend not to... but most importantly the kids see right through it. This is why Gen Z hates millennials so much.
Yes.
However, it cannot stop as long as a large segment of the people in power do with abandon whatever they feel like, without any repercussion.
This is the only way it is possible for many people to get anything remotely resembling justice (although often it's revenge). As long as we don't fundamentally address inequality and deeply unjust systems, I don't think it will stop.
Is that request not a call to cancel cancel-culture?
No, that would be if we called cancel-culture racist and anyone who perpetuated it a white supremacist.
By assigning moral outrage to one side of the debate, we remove the pretense of a debate. It's no longer about evidence and facts but vilifying one side. It's ad hominem 2.0 if you will, and it works because we as a society have a visceral negative reaction to some labels.
The problem is that pavlovian-esque training can be untrained. If you call everyone who does something you don't like a nazi, then pretty soon it doesn't seem like being a nazi is all that big of a deal. That in itself is bad because by abusing the term you buy cover for actual, literal nazis. The same issue applies when you label everything racist or sexist or otherwise.
Words have power, but that power can fade if misused.
10 replies →
Tolerant of everything except intolerance et-all.
i see we are in a conundrum.
6 replies →
>>"Here is a story I heard from a friend, which I will alter slightly to protect the innocent. A prestigious psychology professor signed an open letter in which psychologists condemned belief in innate sex differences. My friend knew that this professor believed such differences existed, and asked him why he signed the letter. He said that he expected everyone else in his department would sign it, so it would look really bad if he didn’t. My friend asked why he expected everyone else in his department to sign it, and he said “Probably for the same reason I did”.
this post is no longer available, of course
I don't even think he said "model's don't cause bias," he just said "ML systems are biased when data is biased."
I don't understand how people can defend his detractors in this particular case. Are you telling me that an image upsampling model that does not contain hard coded bias, and trained on unbiased data will produced biased result? Especially the kind of biased result represented by the error made by the original tweeter who fucked up?
Just curious, but what "error" did the original tweeter make? Did anyone really expect the model to accurately reconstruct the original photo starting from a pixelated mess? That makes no sense to anyone with even a passing knowledge of ML. You're always going to get craploads of bias and variance (i.e. blatant inaccuracy, over and above the bias) in such a setting, even starting from "ideal, unbiased" data. The problem domain is at issue here.
Yeah I get your point. But I guess for this model you can kinda have a concept of the "ideal" training set, where all high frequency features appear at the same rate as in real world.
>will request FB to fire him... (which thankfully won't happen)
Corporations don't fire this fast, give it couple weeks and he will move to other position "for personal reasons", where he will rest-and-vest for the few months, before finally being let go.