Addressing bias in generative AI starts with training data explainability

08 Mar 2024 7 minute read

In an age where artificial intelligence (AI) plays an increasingly influential role in our lives, understanding how and why AI models make the decisions they do is paramount. The term "AI explainability" refers to the capacity to unravel the complex inner workings of AI systems and understand and interpret their outputs. 

Ensuring the transparency and explainability of the data used to train your generative AI models is the first step toward addressing issues such as AI bias. But understanding how bias can manifest itself in seemingly neutral training data, how biased data can impact users of the model, and how to ensure explainability of AI training data so that issues like bias can be effectively addressed aren’t always easy. Let's dive into these concepts by exploring real-world examples of how bias can creep into generative AI systems undetected.

The power and pitfalls of AI training data

Training data is the lifeblood of AI systems. It's the raw material from which intelligent algorithms learn patterns, relationships and information. But, as the saying goes, "garbage in, garbage out." If AI training data is flawed or biased, the AI model's outputs will be flawed and biased as well.
This is particularly true in the case of generative AI, where systems create new outputs based on patterns learned from training data. Even small biases in training data can have significant impacts on generative AI outputs, potentially leading to discriminatory or unfair outcomes for end users. For example, an AI writing tool trained on a predominantly male-authored dataset may produce biased language that reinforces gender stereotypes. Similarly, a facial recognition system trained on a dataset that lacks diversity may struggle to accurately identify individuals from marginalized groups.
One of the main challenges with addressing bias in AI is that it can be difficult to detect. Biases can hide within seemingly impartial data, making them challenging to uncover and address. This is where applying the concept of AI explainability to AI training data becomes crucial. By providing transparency into the source, composition and quality of the data used to train an AI model, we can better understand how bias may have crept in and take steps to mitigate it. Additionally, through thorough testing and evaluation of AI systems, we can identify any potential biases in their outputs and work towards correcting them.
Let's turn our attention to some of the ways that bias can be introduced into training data, as well as practical, actionable steps we can take to mitigate and address it by applying the concept of AI data explainability.

The unintended bias of data workers

Data workers who curate, clean and label the data used to train or fine-tune AI models play a pivotal role in ensuring the quality and fairness of generative AI systems. However, they can unknowingly introduce bias into the data – often through subtle, unintentional actions.
Let's examine a few scenarios which illustrate how this can occur.

Scenario 1: Lack of diversity or representation

Take for example a generative AI project with limited budget available to prepare training data or fine-tune the AI. Due to tight budget constraints, the data workers engaged on the project may be restricted to lower-wage countries. As a result, the community of workers preparing the training data for the generative AI lacks diversity and isn’t fully representative of the users of the system. This can lead to the unintentional introduction of cultural biases around factors such as gender, age, race or religion, into the data used to train the generative AI system.

Scenario 2: Ambiguous labelling

Imagine an AI data worker tasked with labelling images of people as "professionals" or "non-professionals" for a job recruitment AI. While labelling, they may subconsciously associate certain characteristics, like race or gender, with the "professional" label – introducing racial and gender biases into AI training data. This could result in the AI system being more likely to recommend candidates from specific demographics, leading to discriminatory hiring practices.

Scenario 3: Limited perspective

Consider a data worker involved in training an AI model designed to gauge customer sentiment towards a product or service, based on social media comments. If the data worker's cultural background or personal opinions are largely positive towards the product, they might label neutral or slightly negative comments as positive. This could skew the AI model's learning, resulting in an overly positive sentiment analysis that misleads the company, causing it to make ineffective business and strategy decisions.

Scenario 4: Reinforcing stereotypes

When preparing data to train a voice recognition system, data workers often transcribe spoken words into text. Unintentionally, they may adjust the transcriptions to align with common linguistic norms or stereotypes, inadvertently introducing biases to the training data. For example, if the system is trained on transcriptions that consistently replace non-standard grammatical constructs with standard versions (for example, "ain't" to "isn't"), it may struggle to recognize and correctly transcribe speech from some dialects or accents, reinforcing linguistic stereotypes.

The ripple effect: How bias trickles down

These are just a few examples of how bias can be introduced into AI training data. These examples of bias in training data can trickle through a generative AI system, ultimately reaching the end user and producing incorrect, biased, offensive or discriminatory outputs as a result of:
  • Learning biased associations: During model training, the biased training data informs the AI model's understanding of the world. It learns patterns, associations and biases present in the data. This means the model will be more likely to make predictions and produce outputs that align with the biases present in the training data.
  • Amplified stereotypes: Subtle biases present in the training data can be amplified during the generative process. The model may not only replicate existing biases but also exaggerate them in generated outputs. This occurs when the model, based on its learned biases, produces outputs that are more extreme or skewed than the input data.
  • New bias creation: Bias in training data can lead to the emergence of new, unintended biases in generative AI outputs. As the model learns patterns from the data, it might create associations or infer relationships that were not explicitly present in the training data, but align with the learned biases. This can result in the generation of content with novel biases that were inadvertently introduced during the learning process, further complicating efforts to identify and mitigate bias in the outputs.

Addressing bias through explainability

If your AI training data isn’t explainable, the system won’t be able to show you the reasoning behind its biased outputs beyond simply pointing back to biased data. Several strategies can be implemented to ensure data explainability and mitigate the risk of bias being introduced to a generative AI system through its training data:
  1. Know the source: Make sure you understand where your AI training data comes from and who prepared it. You should know where workers are located and how they will be hired, compensated and trained to minimize the potential for biases to be introduced to training data from the start.
  2. Ensure diversity and representation: Make sure the users of your generative AI are fully represented in the training data. Hire data workers from diverse backgrounds, locations and perspectives, and train them to recognize and mitigate bias.
  3. Promote transparency and document everything: Document data curation and labelling processes to identify potential sources of bias. Maintain proper documentation about the data used for AI training and any decisions made during the process to provide transparency into how the data was selected and labelled.
  4. Continuously monitor and update: Implement continuous monitoring and auditing of AI model outputs for bias and fairness, even after deployment. This allows for the detection of any biases that may have been missed during the training process, enabling updates and improvements to be made as needed.
Once you’re able to identify biases in AI training data, you’ll then be better equipped to apply bias mitigation techniques such as data resampling, data augmentation and feature selection/weighting to address them. By applying the concept of AI explainability to training data, we can make significant strides towards building generative AI systems that are more transparent, fair and accountable.
Planning a generative AI project? Download our generative AI decision roadmap to understand key decisions you should make upfront to ensure project success.
Vasagi Kothandapani

Vasagi Kothandapani

President, Enterprise Services and TrainAI, RWS
Vasagi is President of Enterprise Services, responsible for multiple global client accounts at RWS, as well as RWS’s TrainAI data services practice which delivers leading-edge AI training data solutions to global clients across a broad range of industries.  She has 27 years of industry experience and has held various leadership positions in business delivery, technology, sales, product management, and client relationship roles in both product development and professional services organizations globally. She spent most of her career working in the technology and banking sectors, supporting large-scale technology and digital transformation initiatives.
Prior to joining RWS, Vasagi worked with Appen where she was responsible for managing a large portfolio of AI data services business for the company’s top global clients.  Before that she spent two decades at Cognizant and CoreLogic in their banking and financial services practice, managing several banks and fintech accounts. Vasagi holds a Master’s degree in Information Technology, a Post Graduate Certificate in AI, and several industry certifications in Data Science, Architecture, Cybersecurity, and Business Strategy.
All from Vasagi Kothandapani