
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2024-05-19 07:53

CSOCMP5328 - Advanced Machine Learning

Bias and Fairness in Large Language Models (LLMs)

This is a group assignment, 2 to 3 students only. This is NOT an individual assignment. It is worth

25% of your total mark.

1. Introduction

Generative AI models have garnered significant attention and adoption in various domains due to

their remarkable output quality. Nevertheless, these models, reliant on massive, internet-sourced

datasets, exhibit vulnerabilities that sparked a debate on important ethical concerns, especially

around fairness, pertaining to the amplification of human biases and a potential decline in


This assignment aims to investigate methods for bias mitigation within generative AI models and

provide your own method to mitigate the bias in the LLMs. While there are two main critical areas:

Text-to-Text and Text-to-Image where fairness is paramount, our focus in this assignment is

specifically on the Text-to-Text problem.

● Text-to-Text using Large Language Models (LLMs): This area encompasses prominent

language models such as Llama-2, BERT, T5, GPT-2/3, and Chat-GPT, and examines the

potential for these models to generate biased textual content and its implications.

1.1 Common biased categories

To contextualise our investigation, we have identified several common categories of bias that

may manifest within generative AI models:

● Gender and Occupations: One significant aspect involves exploring biases related to

gender disparities in various professions. By analysing the output of generative models, we

can discern whether these models tend to associate specific careers more with one gender

over another, thus potentially perpetuating occupational stereotypes, for example:

○ Text-to-Text: GPT-2 may generate text that reinforces traditional gender

stereotypes. For example, it might associate caregiving with women and leadership

with men, perpetuating societal biases. Example: "She is a nurturing mother,

always putting her family first."

○ Text-to-Image: The results generated by Stable Diffusion for the prompt “A photo

of a firefighter.”  

● Race / Ethnicity: Another critical dimension involves assessing biases related to race and


○ Text-to-Text: GPT-2 may generate text that perpetuates racial stereotypes or

generalisations about specific racial or ethnic groups, for example: "Asian people

are naturally good at math." or the model may generate content that oversimplifies

or misrepresents the cultures and traditions of certain racial or ethnic groups. for

example: "All Latinos are passionate dancers."

○ Text-to-Image: The bias results for “intelligent person” using Image Search


Addressing bias and fairness in generative AI represents a complex and ongoing challenge.

Researchers and developers are actively engaged in devising a range of techniques aimed at bias

detection and mitigation. These approaches include the diversification of training data sources, the

development of ethical guidelines for AI development, and the creation of algorithms designed

explicitly to identify and rectify bias within AI-generated outputs.

1.2 Safety

Generative AI is used in intentionally harmful ways. This includes misusing generative AI to

generate child sexual exploitation and abuse material based on images of children, or generating

sexual content that appears to show a real adult and then blackmailing them by threatening to

distribute it over the internet. Generative AI can also be used to manipulate and abuse people by

impersonating human conversation convincingly and responding in a highly personalised manner,

often resembling genuine human responses.

Note: The resultant figures from Stable Diffusion are only presented to demonstrate the bias. This

assignment is only for "text-based bias and fairness" in LLMs.

2. A Guide to Using the Datasets

To effectively investigate and assess bias within generative AI models for Text-to-Text, it is crucial

to select appropriate datasets that reflect real-world scenarios and challenges. Depending on your

chosen focus, you may need to find specific datasets for your area of investigation e.g., healthcare,

sports, entertainment datasets etc. We provide some examples below however you are free to choose any dataset not listed. There are several datasets used for LLM bias evaluation [1], you

may refer to this link for more information: https://github.com/i-gallegos/Fair-LLM-Benchmark.

Those datasets are only used for evaluation, do not train your model with these datasets.

Depending on your research objectives, select training datasets that align with your area of


● Access the chosen datasets through official sources, research papers, or relevant


● Download the training dataset (s) to your local environment. Ensure that you adhere to any

licensing or usage terms associated with the dataset(s). Depending on the debiasing

techniques employed, retraining the model may be necessary. Commonly utilised datasets

for training LLMs such as Common Crawl, Wikipedia, BookCorpus, PubMed, arXiv,

ImageNet, COCO, VQA, Flickr30k, etc.

● Pre-process the dataset as necessary for compatibility with your chosen de-biasing (i.e.,

enabling fairness) methods in generative AI model. Consider factors like label imbalance

among various demographic groups in the training data, as this can lead to bias. One

common method for addressing bias is counterfactual data augmentation (CDA) [1] to

balance labels. Additionally, other pre-processing techniques involve adjusting harmful

information in the data or eliminating potentially biased texts. Identify and handle harmful

text subsets using different methods to ensure a fairer training corpus.

● Integrate the pre-processed dataset(s) into your code for training and evaluation. Ensure

that you have the appropriate data loading and pre-processing routines in place to work

seamlessly with generative AI models.

Remember that data pre-processing and formatting are crucial steps in ensuring that the datasets

are ready for input into your generative AI models. Additionally, make sure to document your

dataset selection and pre-processing steps thoroughly in your research report for transparency and


3. Performance Evaluations

Most fairness metrics for LLMs can be categorised by what they use from the model such as the

embeddings, probabilities, or generated text, including:

● Embedding-based metrics: Using the dense vector representations to measure bias, which

are typically contextual sentence embeddings.

● Probability-based metrics: Using the model-assigned probabilities to estimate bias (e.g., to

score text pairs or answer multiple-choice questions).

● Generated text-based metrics: Using the model-generated text conditioned on a prompt

(e.g., to measure co-occurrence patterns or compare outputs generated from perturbed


4. Tasks

Your main tasks are:

● Research: Conduct in-depth research to identify various methods for addressing bias in

Generative AI. Ensure you understand the theoretical foundations and practical

implementation of these methods. Provide comprehensive comparison of various methods

based on the conducted evaluations and discuss their contributions, evaluation methods,

strengths, and weaknesses (this will help in the Related Work section of the report).

● Proposed Mathematical Model:

○ Chose a language model such as Llama-2, BERT, T5, GPT-2/3, and Chat-GPT you

would like to remove the bias. Write mathematical model for your proposed

approach, represent training datasets as a database or feature sets etc., preprocessing

steps that you have taken on the training datasets, the objective and

optimisation method that you employed, training model using LLM, and evaluation

metrics to evaluate your model. Write comprehensive table to show all the notations

along with their descriptions.

○ Write algorithms to show all the steps of the proposed approach, including system

initialisation, training/testing, bias evaluations, results evolutions, or any other

steps that show the implementation of your proposed approach.

○ Show schematic representation of your proposed approach.

● Code Development:

○ Implement the selected bias mitigation methods, based on the proposed

mathematical model.

○ Train the model using selected LLM with the pre-processed dataset (if needed).

○ Evaluate the bias, show experimental evaluations of various metrics, generate their

corresponding figures.

○ The code (including interfacing for training model using LLM and results

evaluations) must be written in Python 3. You are allowed to use any external

libraries for performance comparisons; however, you need to provide details on

how the libraries were setup and how evaluation metrics were used, in the Appendix


● Evaluation:

○ Perform the chosen model before applying debiasing techniques on evaluation

datasets and show if the bias exists via various prompts, these results are termed as

the baseline.

○ Pre-process the dataset and train the model using LLM using your proposed

method. Evaluate the performance of the trained model via various prompts to

demonstrate that you have addressed the bias. Note that, some debiasing techniques

may not require retraining the model.

○ Compare the performance of proposed method with the baseline.

○ Evaluate other performance evaluation metrics, e.g., utility, training time, average,

St. Dev etc. Note that some of the evaluation metrics might not be applicable in

your proposed scenario, hence, you must actively think of various evaluation

metrics to determine the applicability of your model; comprehensive literature survey will help you find how authors evaluated the bias and enabled fairness of

generative AI models.

○ Important: Please note that this is our understanding of how to carry out this study

and evaluations i.e., show bias of chosen model via prompts à apply chosen

debiasing technique (for example, pre-process the dataset (to remove imbalance

labels and re-train model with pre-processed dataset) à via prompts, show that you

have addressed the bias à compare baseline with proposed approach. If you think

that this might not work, you need to come up with other techniques.

● Conclude:

○ Conclude your findings and show the strengths and weaknesses of your proposed


○ Provide hypothetical comparison of your approach with other approaches in the

literature. This comparison could be based on various performance metrics.

○ Provide future research directions about how to mitigate those weaknesses.

○ Provide comprehensive directions on how your proposed model could be

generalised and applicable for various application scenarios e.g., social media

applications, stock markets, health or sports analytics etc.

Note: Above steps are written with quite details. If you still have any ambiguity about those steps

or implementation/technical questions or understanding of the problem scenario, then please do

your own research, share your findings on the Ed so that other students could also get idea of how

to deal with specific problem steps. Furthermore, please also post your concerns/questions no Ed

under the “Assignment 2” thread, our teaching team will be happy to share their experience and

suggestions. Please note that this is an open research assignment, use your own creativity and come

up with the understanding of this problem scenario and solution.

4.1 Report

The report should be organised similar to research papers, and should contain at least the following



• Clearly introduces the topic scenario and its significance.

• Provides a concise summary of the proposed evaluation method.

• Provide the results from various evaluation metrics.

• Conclude your contributions and discuss its applicability in the real-world scenario.


• Clearly introduces the problem of bias in generative AI and its importance.

• Provides a clear and detailed overview of the proposed methods.

• Write contributions in detail e.g., pre-processing, experimental setup, mathematical

model, proposed evaluation method and metrics, various steps to achieve evaluate your


• Provide discussion on the key results and show the organisation of your report at the end

of this section.

Related Work:

• Provides a comprehensive review of related debiasing and fairness methods.

• Discusses the advantages and disadvantages of the reviewed methods in the literature.

• Demonstrates understanding of the existing literature.

• Provide a summarised table of the existing works and show their contributions, evaluation

method, strengths, and weaknesses of existing work.

Proposed Method:

• Explains the theoretical foundations of the proposed solution effectively.

• Describes the details of debiasing methods clearly, including the objective function.

• Presents the algorithmic representation of the proposed solution comprehensively.

• Show schematic representation of your proposed approach.


• Provides a clear description of the experimental setup, including datasets, algorithm

evaluations, and metrics.

• Presents experimental results effectively, with appropriate figures.

• Conducts a thorough analysis and comparison of baseline and proposed method.

• Provides detailed insights on the results.


• Effectively summarises the methods and results.

• Provides valuable insights or suggestions for future work.

• Provide strengths and weaknesses of your work, furthermore, provide future directions.


• Lists all references, cited in the report.

• Formats all references consistently and correctly.


• Provide instructions on how to run your code.

• Provide additional/supporting figures or experimental evaluations.

Note: Please follow the provided latex format for the report on Canvas.

5. Submission guidelines

1. Go to Canvas and upload the following files/folders compressed together as a zip file.

● Report (a PDF file)

The report should include all member’s details (student IDs and names).

● Code (a folder):

○ Algorithm (a sub-folder): Your code (could be multiple files or a project) ○ Input data (a sub-folder) Empty. Please do NOT include the dataset in the zip file

as they are large. Please provide detailed instructions on how the datasets are used

and how to download them. We will copy the dataset to the input folder when we

test the code.

2. A plagiarism checker will be used, both for code and report.

3. A penalty of MINUS 20 percent marks (−20%) per day after the due date. The maximum

delay is 5 (five) days, after that assignments will not be accepted.

Note: Only one student needs to submit the zip file which must be renamed as student ID numbers

of all group members separated by underscores, which should contain all the relevant files and

report. E.g., “xxxxxxxx_xxxxxxxx_xxxxxxxx.zip”. Please write names and email addresses of

each member in the report.

Example References:

1. Bias and Fairness in Large Language Models: A Survey. Isabel O. Gallegos, Ryan A.

Rossi, Joe Barrow, Md Mehrab Tanjim, Sungchul Kim, Franck Dernoncourt, Tong Yu,

Ruiyi Zhang, Nesreen K. Ahmed. https://arxiv.org/abs/2309.00770

2. A Survey on Fairness in Large Language Models. Yingji Li, Mengnan Du, Rui Song, Xin

Wang, Ying Wang. https://arxiv.org/abs/2308.10149

3. Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness. Felix Friedrich,

Manuel Brack, Lukas Struppek, Dominik Hintersdorf, Patrick Schramowski, Sasha

Luccioni, Kristian Kersting. https://arxiv.org/abs/2302.10893

4. Stable Bias: Analyzing Societal Representations in Diffusion Models. Alexandra Sasha

Luccioni, Christopher Akiki, Margaret Mitchell, Yacine Jernite.


6. Marking Rubrics

Criterion Marks Comments

Coding (30 Marks):

• Coding will be run to see whether it works properly and

produces the figures and all evaluations demonstrated in

the report.

Abstract (5 Marks):

• Clearly introduces the topic scenario and its

significance. (1 Marks)

• Provides a concise summary of the proposed evaluation

method. (2 Marks)

• Provide the results from various evaluation metrics. (1


• Conclude your contributions and discuss its

applicability in the real-world scenario. (1 Marks)

Introduction (10 Marks):

• Clearly introduces the problem of bias in generative AI

and its importance. (3 Marks)

• Provides a clear and detailed overview of the proposed

methods. (3 Marks)

• Write contributions in detail e.g., pre-processing,

experimental setup, mathematical model, proposed

evaluation method and metrics, various steps to achieve

evaluate your results. (2 Marks)

• Provide discussion on the key results and show the

organisation of your report at the end of this section. (2


Related Work (10 Marks):

• Provides a comprehensive review of related debiasing

and fairness methods. (3 Marks)

• Discusses the advantages and disadvantages of the

reviewed methods in the literature. (3 Marks)

• Demonstrates understanding of the existing literature. (2


• Provide a summarised table of the existing works and

show their contributions, evaluation method, strengths,

and weaknesses of existing work. (2 Marks)


Proposed Method (20 Marks):

• Explains the theoretical foundations of the proposed

solution effectively. (7 Marks)

• Describes the details of debiasing methods clearly,

including the objective function. (4 Marks)

• Presents the algorithmic representation of the proposed

solution comprehensively. (7 Marks)

• Shows schematic representation of proposed approach.

(2 Marks)

Experiments/Evaluations (20 Marks):

• Provides a clear description of the experimental setup,

including datasets, algorithm evaluations, and metrics.

(7 Marks)

• Presents experimental results effectively, with

appropriate figures. (7 Marks)

• Conducts a thorough analysis and comparison of

baseline and proposed method. (4 Marks)

• Provides detailed insights on the results. (4 Marks)

Conclusion (5 Marks):

• Effectively summarises the methods and results. (1


• Provides valuable insights or suggestions for future

work. (2 Marks)

• Provide strengths and weaknesses of your work,

furthermore, provide future directions. (2 Marks)


• Lists all references, cited in the report.

• Formats all references consistently and correctly.

Overall Presentation (10 Marks):

• Maintains a clear and logical structure throughout the

report. (5 Marks)

• Demonstrates excellent writing quality, including clarity

and coherence. (3 Marks)

• Adheres to formatting and citation guidelines

consistently. (2 Marks)

Total: 100 Marks

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图
