Skip to content

Marco210210/Synthetic-Fairness-Faces

Repository files navigation

Synthetic Data for Fairness: Bias Mitigation in Facial Attribute Recognition

This repository contains the official codebase for our study on bias mitigation in soft facial attribute recognition.

We present a fully automated framework for the targeted generation of synthetic facial images under controlled demographic conditions.
Our method integrates both prompt-based (diffusion) and latent-space (GAN) generative models to synthesize demographically balanced data, which is automatically annotated with facial attributes and demographic labels.
This enables a rigorous, fine-grained fairness evaluation across gender, age, and ethnicity.

Experiments conducted on the MAAD-Face benchmark show that our method significantly reduces disparities between demographic groups—without sacrificing overall classification performance.
The complete dataset, code, and evaluation tools are released to support reproducible and fairness-aware research.

📄 Read the paper (IEEE)

synthetic-fairness-faces is a modular and fully automated framework for generating and annotating synthetic facial datasets aimed at improving fairness in soft-biometric attribute recognition.

We designed the system to integrate Stable Diffusion, StyleGAN3, Pix2Pix Diffuser, MagFace, FaceXFormer, FairFace, and Slim-CNN into a cohesive pipeline for image generation, attribute annotation, classifier training, and demographic fairness evaluation.


🧠 Key Features

  • 🧬 Dual image generation: prompt-based (Stable Diffusion) and latent-based (GAN)
  • 🎯 Attribute control: generate faces by gender, age, ethnicity, and hair color
  • 🧹 Quality filtering: discard low-quality or unrealistic faces using MagFace embeddings
  • 🏷️ Automated annotation: soft-biometric attributes (FaceXFormer) and demographics (FairFace)
  • 🧪 Fairness evaluation: compute FDR, Demographic Parity, and Equalized Odds across subgroups
  • 📊 Model retraining: evaluate Slim-CNN on real + synthetic data (CelebA / MAAD-Face)

🛠️ Technologies Used

  • Stable Diffusion (Realistic Vision V5.0 noVAE) for text-to-image generation
  • StyleGAN3 + Pix2Pix Diffuser for GAN-based latent manipulation and refinement
  • MagFace for identity embeddings and quality-aware filtering
  • FaceXFormer and FairFace for multi-attribute and demographic labeling
  • Slim-CNN: efficient model for soft attribute classification
  • CelebA and MAAD-Face as benchmark datasets
  • Python, PyTorch, Hugging Face, OpenCV, dlib, NumPy

🏗️ Pipeline Overview

The synthetic-fairness-faces framework supports the full bias mitigation workflow with a sequence of automated steps:

  1. Prompt checking and validation

    • Ensures the prompt.txt file is present and non-empty
  2. Synthetic image generation

    • Default: Stable Diffusion (generate_images_with_selected_model.py)
    • Alternative: GAN-based methods (manual switch required)
  3. Image resizing and quality filtering

    • Resize to 512x512, 178x218, and 112x112
    • Filter low-quality faces using MagFace magnitude
  4. Feature extraction and evaluation

    • Extract MagFace embeddings
    • Discard samples with magnitude < 20
  5. Attribute and demographic labeling

    • Two options:
      • Facer + FairFace: label with Facer, validate with FairFace
      • FaceXFormer: single-step labeling via transformer model
  6. Dataset partitioning and model training

    • Partition into train/val/test (80/10/10)
    • Train Slim-CNN on CelebA + synthetic data
  7. Evaluation and fairness analysis

    • Evaluate Slim-CNN on MAAD-Face benchmark
    • Analyze performance across gender, age, and ethnicity

pipeline

You can execute the entire process automatically by running pipeline_automatic.py.
The script performs all steps in sequence, including image generation, preprocessing, labeling, training, and fairness evaluation.


📂 Project Structure

A detailed file tree is available here


🧪 Setup Instructions

Create the virtual environment

Install all required dependencies using Conda:

conda env create -f environment.yml
conda activate env

Download models and datasets

Model and dataset requirements are automatically checked at runtime by pipeline_automatic.py.
If any file is missing, the script will print the direct download links.

🔍 Pretrained Models

🗂️ Datasets

  • CelebA dataset
    Download
    Extract to: img/CelebA/img_align_celeba/

  • VGGFace2 (test set only)
    Download on Kaggle
    Extract only the val/ folder to: img/vggface2_test/test/

  • MAAD_Face.csv
    Download
    Place in: img/vggface2_test/


🚀 Run the Full Pipeline

To execute the full workflow:

python pipeline_automatic.py

You’ll be prompted to select the labeling method:

  • 1 → Facer + FairFace (separate labeling + demographic validation)
  • 2 → FaceXFormer (joint attribute and demographic labeling)

By default, image generation is performed using Stable Diffusion (SD).

🌀 To use GAN-based generation instead, comment the SD script inside pipeline_automatic.py and enable the appropriate one in:

image_generation/src/GAN/

All downstream modules (resizing, filtering, annotation, training, and evaluation) remain compatible regardless of the generator used.


📊 View Training & Fairness Results

  • 🔧 Training logs and checkpoints:

    pytorch-slim-cnn/checkpoints*
    pytorch-slim-cnn/plots/
    
  • 📓 Evaluation Notebook (opens in Jupyter):

    pytorch-slim-cnn/evaluate_both.ipynb
    
  • 📄 Fairness metrics and predictions:

    fairness/*.csv
    

👥 Contributing & Contact

  • Mattia Maucioni | 📧 mattiamaucioni [at] icloud.com | Data Science Field
  • Marco Di Maio | 📧 marcodimaio2102 [at] gmail.com | Data Science Field

📖 Paper Reference

L. Cascone, M. Di Maio, V. Loia, M. Maucioni, M. Nappi, and C. Pero,
“Synthetic Data for Fairness: Bias Mitigation in Facial Attribute Recognition,”
IEEE Open Journal of the Computer Society, vol. 6, pp. 1703–1714, 2025.
doi: 10.1109/OJCS.2025.3622694

📚 BibTeX

@ARTICLE{11206468,
  author={Cascone, Lucia and Maio, Marco Di and Loia, Vincenzo and Maucioni, Mattia and Nappi, Michele and Pero, Chiara},
  journal={IEEE Open Journal of the Computer Society}, 
  title={Synthetic Data for Fairness: Bias Mitigation in Facial Attribute Recognition}, 
  year={2025},
  volume={6},
  number={},
  pages={1703-1714},
  keywords={Face recognition;Facial features;Synthetic data;Computer architecture;Annotations;Prevention and mitigation;Accuracy;Training;Image color analysis;Hair;Facial attribute recognition;bias mitigation;demographic fairness;synthetic data;stable diffusion;generative adversarial networks},
  doi={10.1109/OJCS.2025.3622694}}

📄 License

This project is licensed under the CC BY-NC-SA 4.0 License
License: CC BY-NC-SA 4.0

You may share and adapt this work for non-commercial purposes only, as long as you give appropriate credit and distribute your contributions under the same license.
For commercial use, explicit permission from the authors is required.

About

Official codebase for the paper "Synthetic Data for Fairness: Bias Mitigation in Facial Attribute Recognition" (IEEE, 2025). Includes an automated pipeline for generating, labeling, and evaluating synthetic facial datasets under demographic control using SD/GAN + fairness metrics.

Topics

Resources

Stars

Watchers

Forks

Contributors