Inside AI
Bias in Artificial Intelligence
How bias can explode our AI models
There has been a lot of confusion over Bias in the field of Artificial Intelligence. Let's try to understand and uncomplicate some things!!
What do you see in this picture above?
- Apple? Apples? A Bushel of apples?
- Fruit? Bunch of fruits?
Okay, there is nothing wrong with these answers!! Just something to notice, how often do we tend to say “Red Apples”…. very typical right?? Agreed. So Categorization within Cognitive Science, where some categories are more typical than others is called Prototype Theory. It is a type of Human Bias or Cognitive Bias, which is a result of the brain’s attempt to simplify information processing by interpreting the world around us in a way that affects our decisions and judgments. If you want to read more about Cognitive Bias “Neil deGrasse Tyson” has explained it really well in his lectures.
Facts: More than 180 Human biases have been defined and classified, many of these biases find their way into AI Systems we design.
According to research from Boston University, the majority of the people tend to overlook the possibility that a surgeon could be a “She”. When considering a doctor or a surgeon, we mostly tend to picture something like this!!
So, What is Bias? What types of Biases are there? Why is it Important for us? So I will try to answer these questions in this Blog Series.
Bias can be defined as “A strong inclination of the mind or a preconceived opinion about something or someone” OR “Disproportionate weight in favor of or against an idea or thing”
One of the most important biases, where it all starts from is us, Humans. This starting point from where this bias gets picked up and gets amplified further is called Human Reporting Bias. It is “The frequency with which people write about actions, outcomes or properties is not a reflection of real-world frequencies or the degree to which a characteristic of a class of individual”. Due to this innate nature of biases within humans, the bias gets reflected in all data that exists in the world.
The Picture above: It shows us that within the first step of training a model itself, there are so many biases. Biases do not just exist within the data, there are also cognitive biases that a scientist may have while performing research, experiment, or while implementing algorithms. Let’s explore some of these in detail.
Biases in Data :
- Selection Bias: Selection bias can result when the selection of subjects into a study or their likelihood of being retained in the study leads to a result that is different from what you would have gotten if you had enrolled the entire target population Or we can say individuals or groups in a study differ systematically from the population of interest, leading to systematic errors in the outcome. Read more here.
- Sampling Bias: It is one of the types of Selection Bias. It is the bias introduced due to non-random sampling of the population. It occurs when the sample selected does not a true representation of the whole population. Randomization is one of the solutions to Sampling bias. For Example: If you want to know when your website gets most users, and you just look at the daytime data (ignoring the nighttime data), then you are introducing sample bias in your prediction.
2. Prejudice Bias: It is a result of cultural influences and stereotypes. Things that a person doesn’t like in reality like appearances, social class, gender, and others get reflected in the data. For Example, Some people are more biased towards a particular political affiliation or religion or sex, while others might disagree with that view. Say if the model learns that coders are men and homemakers are women, this is prejudice bias because women can obviously code and men can cook. The issue here is that the data set consciously or unconsciously reflects these social stereotypes.
3. Biased in Data Representation:
“Your model is as good as the dataset you use”. Representation bias focuses on portraying the fact that, even if you collect data about all the groups, some groups might not be presented as positively as others. Example: In one study published 15 years ago, two people applied for a job. Their résumés were about as similar as two résumés can be. One person was named Jamal, the other Brendan. White names receive 50 percent more callbacks for interviews. The question arises here that is Emily and Greg More Employable Than Lakisha and Jamal?
4. Bias in data labeling: Label bias occurs when the set of labeled data is not fully representative of the entire universe of potential labels. For example, the image below shows one standard open-source image classifier trained on the Open Images dataset that does not properly apply “wedding” related labels to images of wedding traditions from different parts of the world.
Check out this blog from Google AI: https://ai.googleblog.com/2018/09/introducing-inclusive-images-competition.html
5. Measurement Bias: Sometimes there is a lack of accuracy within the measurement techniques or the instrument itself. When the data doesn’t reflect the real-world environment, there is eventually a bias introduced in the model. Example: One survey team’s portable machine to measure hemoglobin malfunctioned and was not checked, as should be done every day. It measured everyone’s hemoglobin as 0.3 g/L too high. This would lead to an underestimate of the prevalence of anemia because the readings would overestimate the hemoglobin for everyone measured by that team.
6. Confounder bias: Sometimes there are variables that are outside the scope of the existing analytical model but affect the exposure of the model under study, resulting in distortion of true relationship. Read More Here
Biases in Data Interpretation:
- Confirmation Bias: It is the tendency to search for, interpret, favor, and recall information in a way that confirms or strengthens one’s personal beliefs. For example, imagine that a person holds a belief that left-handed people are more creative than right-handed people. Whenever this person encounters a person that is both left-handed and creative, they place greater importance on this “evidence” that supports what they already believe.
2. Overgeneralization: We tend to draw some faulty or inaccurate conclusions about things based on just a few examples or little data. Sometimes information is very general which is mostly an effect of skewed data. This act of drawing conclusions that are too broad because they exceed what could be logically concluded from available information is overgeneralization bias.
3. Automation Bias: The propensity for humans to favor suggestions from automated decision-making systems and to ignore contradictory information made without automation, even if it is correct. We tend to believe that the complicated mathematical system used would surely generate. For example, there are numerous cases of people blindly following the GPS, like the group of tourists in Australia who drove into the Pacific Ocean. This sort of thing also happens quite often in Death Valley, California, that the local rangers have coined the term “death by GPS”, like the one where a couple drove with a destination to Las Vegas and ended up crossing the border. There are many equivalent examples in aviation, where pilots have trusted automated navigation systems even when their own best judgment suggests otherwise.
4. Correlation Fallacy: If you find a correlation between variables, it is often very tempting to be assumed that one of them causes the other one. Just because things vary together does not mean they cause one another. Two things that occur together are claimed to have a cause and effect relationship.
Read More here: Correlation is not causation”
Why understanding bias is important?
“The data is just like textbooks for ML models, these textbooks have Human authors”
Human reported data perpetuates human biases. The systems that use this data are getting implemented in the real-world like in Healthcare, Financial systems, Human resources, Justice system, Policing, etc. This tends to further amplify this bias and generate data further, which again is fed into the system for making predictions. This whole process is called the “Bias network effect” or “Bias Laundering”. These real-world implications of ML models make understanding bias more and more important. We definitely do not want this bias to be amplified and be reflected in real-world use cases. Read More Here.
I also believe that we all (humans) are biased in some way or the other. My belief that my favorite team will win the match today, or if my optimism about this blog keeps me involved in writing more then why not? Now you may ask a question here, “Is Bias Good or Bad?”
Mainly Biases can be categorized into :
- Statistical Bias: The difference between the true value and the outcome value that the model tries to predict. Example: Y= mX + b (here, b=bias)
- Cognitive Bias: Systematic error in human thinking that occurs when people are processing information about the world around them, which also affects the decisions and judgments around them. Example: Optimism/Pessimism bias, Confirmation Bias, Self-serving Bias, Negativity Bias.
- Algorithm Bias: The Unjust, prejudicial treatment which is shown within the algorithmic decision-making system.
In most cases when we are concerned about bias, we mean “Algorithmic bias”. The Algorithms picks up this bias from the data, which is created by humans and includes these human biases. Now, Why is this a problem if our model picks up the bias which was there in the data or examples from the real-world? To explain this, let me give you an example. Even if a Teacher is a strong believer of a religion/thought/political opinion do you think it would be fair that the students pick that biased opinion from the teacher? A fair answer would be that the students should be given a neutral opinion from their teacher. In this case, our ML Models are our creations, we don’t want it to pick some of these biases from us. If we want AI systems to guide us, humans, in making better Decisions-Making process, of course, we don't want it to be biased.
Most of the work goes into understanding these Algorithmic bias in the models and trying to remove bias, in order to keep developing this technology.
“Although neural networks might be said to write their own programs, they do so towards goals set by humans, using data collected for human purposes. If the data is skewed, even by accident, the computers will amplify injustice. If the measures of success that the networks are trained against are themselves foolish or worse, the results will appear accordingly” — By Guardian
Conclusion:
- By learning to identify these biases can help us overcome our own implicit biases.
AI can unintentionally lead to the wrong directions by:
- Sources of bias in the Data
- Feedback Loops
- Lack of careful evaluation of Model
- Interpretability and Explainability Problems in Model
A good read on the topic:
- “I Know Some Algorithms Are Biased — because I Created One”
- “Biased Algorithms Are Easier to Fix Than Biased People”
- https://catalogofbias.org/biases/
Resource: