Source Code for ICSE 2022 Paper - ``Can We Achieve Fairness Using Semi-Supervised Learning?''

Last update: Dec 18, 2021

Related tags

Overview

Fair-SSL

Source Code for ICSE 2022 Paper - Can We Achieve Fairness Using Semi-Supervised Learning?

Ethical bias in machine learning models has become a matter of concern in the software engineering community. Most of the prior software engineering works concentrated on finding ethical bias in models rather than fixing it. After finding bias, the next step is mitigation. Prior researchers mainly tried to use supervised approaches to achieve fairness. However, in the real world, getting data with trustworthy ground truth is challenging and also ground truth can contain human bias. Semi-supervised learning is a domain of machine learning where labeled and unlabeled data both are used to overcome the data labeling challenges. We, in this work, applied four popular semi-supervised techniques as pseudo-labelers to create fair classification models. Our framework, Fair-SSL, takes a very small amount (10%) of labeled data as input and generates pseudo-labels for the unlabeled data. We then synthetically generate new data points to balance the training data based on class and protected attribute as proposed by Chakraborty et al. in FSE 2021. Finally, classification model is trained on the balanced pseudo-labeled data and validated on test data. After experimenting on ten datasets and three learners, we found out that Fair-SSL achieves similar performance like three other state-of-the-art bias mitigation algorithms. Where prior algorithms require much training data, Fair-SSL requires only 10% of the labeled training data. As per our knowledge, this is the first SE work where semi-supervised techniques are used to fight against ethical bias in ML models.

Dataset Description -

1> Adult Income dataset - http://archive.ics.uci.edu/ml/datasets/Adult

2> COMPAS - https://github.com/propublica/compas-analysis

3> German Credit - https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29

4> Bank Marketing - https://archive.ics.uci.edu/ml/datasets/bank+marketing

5> Default Credit - https://archive.ics.uci.edu/ml/datasets/default+of+credit+card+clients

6> Heart - https://archive.ics.uci.edu/ml/datasets/Heart+Disease

7> MEPS - https://meps.ahrq.gov/mepsweb/

8> Student - https://archive.ics.uci.edu/ml/datasets/Student+Performance

9> Home Credit - https://www.kaggle.com/c/home-credit-default-risk

Data Preprocessing -

We have used data preprocessing as suggested by IBM AIF360
The rows containing missing values are ignored, continuous features are converted to categorical (e.g., age<25: young,age>=25: old), non-numerical features are converted to numerical(e.g., male: 1, female: 0). Fiinally, all the feature values are normalized(converted between 0 to 1).
For optimized Pre-processing, plaese visit Optimized Preprocessing

Source Code for ICSE 2022 Paper - ``Can We Achieve Fairness Using Semi-Supervised Learning?''

Related tags

Overview

Fair-SSL

Dataset Description -

Data Preprocessing -

Owner

Code for technical report "An Improved Baseline for Sentence-level Relation Extraction".

The official PyTorch code implementation of "Personalized Trajectory Prediction via Distribution Discrimination" in ICCV 2021.

Safe Control for Black-box Dynamical Systems via Neural Barrier Certificates

A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body

Random Forests for Regression with Missing Entries

NaturalCC is a sequence modeling toolkit that allows researchers and developers to train custom models

Improving Object Detection by Label Assignment Distillation

FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment

A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

Face Mask Detector by live camera using tensorflow-keras, openCV and Python

SegNet-Basic with Keras

CCNet: Criss-Cross Attention for Semantic Segmentation (TPAMI 2020 & ICCV 2019).

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Distributional Sliced-Wasserstein distance code

When BERT Plays the Lottery, All Tickets Are Winning

Improving the robustness and performance of biomedical NLP models through adversarial training

TensorFlow (Python) implementation of DeepTCN model for multivariate time series forecasting.

This is the pytorch implementation for the paper: Learning Accurate Performance Predictors for Ultrafast Automated Model Compression, which is in submission to TPAMI

Code for "Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification", ECCV 2020 Spotlight

Source Code for ICSE 2022 Paper - ``Can We Achieve Fairness Using Semi-Supervised Learning?''

Related tags

Overview

Fair-SSL

Dataset Description -

Data Preprocessing -

Owner

Code for technical report "An Improved Baseline for Sentence-level Relation Extraction".

The official PyTorch code implementation of "Personalized Trajectory Prediction via Distribution Discrimination" in ICCV 2021.

Safe Control for Black-box Dynamical Systems via Neural Barrier Certificates

A real-time approach for mapping all human pixels of 2D RGB images to a 3D surface-based model of the body

Random Forests for Regression with Missing Entries

NaturalCC is a sequence modeling toolkit that allows researchers and developers to train custom models

Improving Object Detection by Label Assignment Distillation

FaceQgen: Semi-Supervised Deep Learning for Face Image Quality Assessment

A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

Face Mask Detector by live camera using tensorflow-keras, openCV and Python

SegNet-Basic with Keras

CCNet: Criss-Cross Attention for Semantic Segmentation (TPAMI 2020 & ICCV 2019).

Unofficial implementation of "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows" (https://arxiv.org/abs/2103.14030)

Near-Optimal Sparse Allreduce for Distributed Deep Learning (published in PPoPP'22)

Distributional Sliced-Wasserstein distance code

When BERT Plays the Lottery, All Tickets Are Winning

Improving the robustness and performance of biomedical NLP models through adversarial training

TensorFlow (Python) implementation of DeepTCN model for multivariate time series forecasting.

This is the pytorch implementation for the paper: *Learning Accurate Performance Predictors for Ultrafast Automated Model Compression*, which is in submission to TPAMI

Code for "Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification", ECCV 2020 Spotlight

This is the pytorch implementation for the paper: Learning Accurate Performance Predictors for Ultrafast Automated Model Compression, which is in submission to TPAMI