Comment by JangoSteve

6 years ago

This is so incredibly common, it's embarrassing. I was on an expert panel about "AI and Machine Learning in Healthcare and Life Sciences" back in January, and I made it a point throughout my discussions to keep emphasizing the amount of bias inherent in our current systems, which ends up getting amplified and codified in machine learning systems. Worse yet, it ends up justifying the bias based on the false pretense that the systems built are objective and the data doesn't lie.

Afterward, a couple people asked me to put together a list of the examples I cited in my talk. I'll be adding this to my list of examples:

* A hospital AI algorithm discriminating against black people when providing additional healthcare outreach by amplifying racism already in the system. https://www.nature.com/articles/d41586-019-03228-6

* Misdiagnosing people of African decent with genomic variants misclassified as pathogenic due to most of our reference data coming from European/white males. https://www.nejm.org/doi/full/10.1056/NEJMsa1507092

* The dangers of ML in diagnosing Melanoma exacerbating healthcare disparities for darker skinned people. https://jamanetwork.com/journals/jamadermatology/article-abs...

And some other relevant, but not healthcare examples as well:

* When Google's hate speech detecting AI inadvertantly censored anyone who used vernacular referred to in this article as being "African American English". https://fortune.com/2019/08/16/google-jigsaw-perspective-rac...

* When Amazon's AI recruiting tool inadvertantly filtered out resumes from women. https://www.reuters.com/article/us-amazon-com-jobs-automatio...

* When AI criminal risk prediction software used by judges in deciding the severity of punishment for those convicted predicts a higher chance of future offence for a young, black first time offender than for an older white repeat felon. https://www.propublica.org/article/machine-bias-risk-assessm...

And here's some good news though:

* A hospital used AI to enable care and cut costs (though the reporting seems to over simplify and gloss over enough to make the actual analysis of the results a little suspect). https://www.healthcareitnews.com/news/flagler-hospital-uses-...

9 comments

JangoSteve

snapetom 6 years ago

I agree 100% about how common it is. The industry also pays lip service about doing something about it. My last job was at a research institution and we had a data ethics czar, who's a very smart (Stats phd) guy and someone I consider a friend. A lot of his job was to go around the org and conferences talking about things like this.

While there's a lot of head nodding, nothing is ever actually addressed in day to day operations. Data scientists barely know what's going on when they throw things through TensorFlow. What matters is the outcome and the confusion matrix at the end.

I say this as someone who works in data and implements AI/ML platforms. Mr. Williams needs to find the biggest ambulance chasing lawyer and file civil suits not only the law enforcement agencies involved, but top down everyone at DataWorks from the president to the data scientist to the lowly engineer who put this in production.

These people have the power to ruin lives. They need to be made an example of and held accountable for the quality of their work.

vmception 6 years ago

Sounds like a license for developing software is inevitable then.

mtgp1000 6 years ago

>When AI criminal risk prediction software used by judges in deciding the severity of punishment for those convicted predicts a higher chance of future offence for a young, black first time offender than for an older white repeat felon.

>When Amazon's AI recruiting tool inadvertantly filtered out resumes from women

>When Google's hate speech detecting AI inadvertantly censored anyone who used vernacular referred to in this article as being "African American English

There's simply no indication that these aren't statistically valid priors. And we have mountains of scientific evidence to the contrary, but if dared post anything (cited, published literature) I'd be banned. This is all based on the unfounded conflation between equality of outcome and equality of opportunity, and the erasure of evidence of genes and culture playing a role in behavior and life outcomes.

This is bad science.

aspenmayer 6 years ago

Please post your sources. Your comments about
> the erasure of evidence of genes and culture playing a role in behavior and life outcomes
are concerning.
JangoSteve 6 years ago
> There's simply no indication that these aren't statistically valid priors. And we have mountains of scientific evidence to the contrary, but if dared post anything (cited, published literature) I'd be banned.
I'd consider reading the sources I posted in my comment before responding with ill-conceived notions. Literally every single example I posted linked to the peer-reviewed scientific evidence (cited, published literature) indicating the points I summarized.
The only link I posted without peer-reviewed literature was the last one with the positive outcome, and that's the one I commented had suspect analysis.
- mtgp1000 6 years ago
  
  Let's just consider an example; where do you draw the line in the following list? To avoid sending travelers through unsafe areas:
  1. Google's routing algorithm is conditioned on demographics
  2. Google's routing algorithm is conditioned on income/wealth
  3. Google's routing algorithm is conditioned on crime density
  4. Google's routing algorithm cannot condition on anything that would disproportionately route users away from minority neighborhoods
  I think the rational choice, to avoid forcing other people to take risks that they may object to, is somewhere between 2 and 3. But the current social zeitgeist seems only to allow for option four, since an optimally sampled dataset will have very strong correlations between 1-3, to the point that in most parts of the us they would all result in the same routing bias.
  
  3 replies →