Synthetic Data for Fairness: Bias Mitigation in Facial Attribute Recognition

This repository contains the official codebase for our study on bias mitigation in soft facial attribute recognition.

We present a fully automated framework for the targeted generation of synthetic facial images under controlled demographic conditions.
Our method integrates both prompt-based (diffusion) and latent-space (GAN) generative models to synthesize demographically balanced data, which is automatically annotated with facial attributes and demographic labels.
This enables a rigorous, fine-grained fairness evaluation across gender, age, and ethnicity.

Experiments conducted on the MAAD-Face benchmark show that our method significantly reduces disparities between demographic groups—without sacrificing overall classification performance.
The complete dataset, code, and evaluation tools are released to support reproducible and fairness-aware research.

📄 Read the paper (IEEE)

synthetic-fairness-faces is a modular and fully automated framework for generating and annotating synthetic facial datasets aimed at improving fairness in soft-biometric attribute recognition.

We designed the system to integrate Stable Diffusion, StyleGAN3, Pix2Pix Diffuser, MagFace, FaceXFormer, FairFace, and Slim-CNN into a cohesive pipeline for image generation, attribute annotation, classifier training, and demographic fairness evaluation.

🧠 Key Features

🧬 Dual image generation: prompt-based (Stable Diffusion) and latent-based (GAN)
🎯 Attribute control: generate faces by gender, age, ethnicity, and hair color
🧹 Quality filtering: discard low-quality or unrealistic faces using MagFace embeddings
🏷️ Automated annotation: soft-biometric attributes (FaceXFormer) and demographics (FairFace)
🧪 Fairness evaluation: compute FDR, Demographic Parity, and Equalized Odds across subgroups
📊 Model retraining: evaluate Slim-CNN on real + synthetic data (CelebA / MAAD-Face)

🛠️ Technologies Used

Stable Diffusion (Realistic Vision V5.0 noVAE) for text-to-image generation
StyleGAN3 + Pix2Pix Diffuser for GAN-based latent manipulation and refinement
MagFace for identity embeddings and quality-aware filtering
FaceXFormer and FairFace for multi-attribute and demographic labeling
Slim-CNN: efficient model for soft attribute classification
CelebA and MAAD-Face as benchmark datasets
Python, PyTorch, Hugging Face, OpenCV, dlib, NumPy

🏗️ Pipeline Overview

The synthetic-fairness-faces framework supports the full bias mitigation workflow with a sequence of automated steps:

Prompt checking and validation
- Ensures the prompt.txt file is present and non-empty
Synthetic image generation
- Default: Stable Diffusion (generate_images_with_selected_model.py)
- Alternative: GAN-based methods (manual switch required)
Image resizing and quality filtering
- Resize to 512x512, 178x218, and 112x112
- Filter low-quality faces using MagFace magnitude
Feature extraction and evaluation
- Extract MagFace embeddings
- Discard samples with magnitude < 20
Attribute and demographic labeling
- Two options:
  - Facer + FairFace: label with Facer, validate with FairFace
  - FaceXFormer: single-step labeling via transformer model
Dataset partitioning and model training
- Partition into train/val/test (80/10/10)
- Train Slim-CNN on CelebA + synthetic data
Evaluation and fairness analysis
- Evaluate Slim-CNN on MAAD-Face benchmark
- Analyze performance across gender, age, and ethnicity

You can execute the entire process automatically by running pipeline_automatic.py.
The script performs all steps in sequence, including image generation, preprocessing, labeling, training, and fairness evaluation.

📂 Project Structure

A detailed file tree is available here

🧪 Setup Instructions

Create the virtual environment

Install all required dependencies using Conda:

conda env create -f environment.yml
conda activate env

Download models and datasets

Model and dataset requirements are automatically checked at runtime by pipeline_automatic.py.
If any file is missing, the script will print the direct download links.

🔍 Pretrained Models

vae.pt for image generation (Stable Diffusion VAE)
Download → place in: models/image_generation/
FaceXFormer
Download → place in: models/facexformer/
FairFace models → place all 3 files in: models/FairFace/
MagFace model
Download → place in: models/magface/

🗂️ Datasets

CelebA dataset
Download
Extract to: img/CelebA/img_align_celeba/
VGGFace2 (test set only)
Download on Kaggle
Extract only the val/ folder to: img/vggface2_test/test/
MAAD_Face.csv
Download
Place in: img/vggface2_test/

🚀 Run the Full Pipeline

To execute the full workflow:

python pipeline_automatic.py

You’ll be prompted to select the labeling method:

1 → Facer + FairFace (separate labeling + demographic validation)
2 → FaceXFormer (joint attribute and demographic labeling)

By default, image generation is performed using Stable Diffusion (SD).

🌀 To use GAN-based generation instead, comment the SD script inside pipeline_automatic.py and enable the appropriate one in:
image_generation/src/GAN/

All downstream modules (resizing, filtering, annotation, training, and evaluation) remain compatible regardless of the generator used.

📊 View Training & Fairness Results

🔧 Training logs and checkpoints:

pytorch-slim-cnn/checkpoints*
pytorch-slim-cnn/plots/

📓 Evaluation Notebook (opens in Jupyter):
```
pytorch-slim-cnn/evaluate_both.ipynb
```
📄 Fairness metrics and predictions:
```
fairness/*.csv
```

👥 Contributing & Contact

Mattia Maucioni | 📧 mattiamaucioni [at] icloud.com | Data Science Field
Marco Di Maio | 📧 marcodimaio2102 [at] gmail.com | Data Science Field

📖 Paper Reference

L. Cascone, M. Di Maio, V. Loia, M. Maucioni, M. Nappi, and C. Pero,
“Synthetic Data for Fairness: Bias Mitigation in Facial Attribute Recognition,”
IEEE Open Journal of the Computer Society, vol. 6, pp. 1703–1714, 2025.
doi: 10.1109/OJCS.2025.3622694

📚 BibTeX

@ARTICLE{11206468,
  author={Cascone, Lucia and Maio, Marco Di and Loia, Vincenzo and Maucioni, Mattia and Nappi, Michele and Pero, Chiara},
  journal={IEEE Open Journal of the Computer Society}, 
  title={Synthetic Data for Fairness: Bias Mitigation in Facial Attribute Recognition}, 
  year={2025},
  volume={6},
  number={},
  pages={1703-1714},
  keywords={Face recognition;Facial features;Synthetic data;Computer architecture;Annotations;Prevention and mitigation;Accuracy;Training;Image color analysis;Hair;Facial attribute recognition;bias mitigation;demographic fairness;synthetic data;stable diffusion;generative adversarial networks},
  doi={10.1109/OJCS.2025.3622694}}

📄 License

This project is licensed under the CC BY-NC-SA 4.0 License

You may share and adapt this work for non-commercial purposes only, as long as you give appropriate credit and distribute your contributions under the same license.
For commercial use, explicit permission from the authors is required.

Name		Name	Last commit message	Last commit date
Latest commit History 380 Commits
docs		docs
facer		facer
facexformer		facexformer
fairface		fairface
fairness		fairness
image_generation		image_generation
img		img
maad-face		maad-face
magface		magface
model_selection		model_selection
pytorch-slim-cnn		pytorch-slim-cnn
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
pipeline_automatic.py		pipeline_automatic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synthetic Data for Fairness: Bias Mitigation in Facial Attribute Recognition

🧠 Key Features

🛠️ Technologies Used

🏗️ Pipeline Overview

📂 Project Structure

🧪 Setup Instructions

Create the virtual environment

Download models and datasets

🔍 Pretrained Models

🗂️ Datasets

🚀 Run the Full Pipeline

📊 View Training & Fairness Results

👥 Contributing & Contact

📖 Paper Reference

📚 BibTeX

📄 License

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Synthetic Data for Fairness: Bias Mitigation in Facial Attribute Recognition

🧠 Key Features

🛠️ Technologies Used

🏗️ Pipeline Overview

📂 Project Structure

🧪 Setup Instructions

Create the virtual environment

Download models and datasets

🔍 Pretrained Models

🗂️ Datasets

🚀 Run the Full Pipeline

📊 View Training & Fairness Results

👥 Contributing & Contact

📖 Paper Reference

📚 BibTeX

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages