This repository contains the official codebase for our study on bias mitigation in soft facial attribute recognition.
We present a fully automated framework for the targeted generation of synthetic facial images under controlled demographic conditions.
Our method integrates both prompt-based (diffusion) and latent-space (GAN) generative models to synthesize demographically balanced data, which is automatically annotated with facial attributes and demographic labels.
This enables a rigorous, fine-grained fairness evaluation across gender, age, and ethnicity.
Experiments conducted on the MAAD-Face benchmark show that our method significantly reduces disparities between demographic groups—without sacrificing overall classification performance.
The complete dataset, code, and evaluation tools are released to support reproducible and fairness-aware research.
synthetic-fairness-faces is a modular and fully automated framework for generating and annotating synthetic facial datasets aimed at improving fairness in soft-biometric attribute recognition.
We designed the system to integrate Stable Diffusion, StyleGAN3, Pix2Pix Diffuser, MagFace, FaceXFormer, FairFace, and Slim-CNN into a cohesive pipeline for image generation, attribute annotation, classifier training, and demographic fairness evaluation.
- 🧬 Dual image generation: prompt-based (Stable Diffusion) and latent-based (GAN)
- 🎯 Attribute control: generate faces by gender, age, ethnicity, and hair color
- 🧹 Quality filtering: discard low-quality or unrealistic faces using MagFace embeddings
- 🏷️ Automated annotation: soft-biometric attributes (FaceXFormer) and demographics (FairFace)
- 🧪 Fairness evaluation: compute FDR, Demographic Parity, and Equalized Odds across subgroups
- 📊 Model retraining: evaluate Slim-CNN on real + synthetic data (CelebA / MAAD-Face)
- Stable Diffusion (Realistic Vision V5.0 noVAE) for text-to-image generation
- StyleGAN3 + Pix2Pix Diffuser for GAN-based latent manipulation and refinement
- MagFace for identity embeddings and quality-aware filtering
- FaceXFormer and FairFace for multi-attribute and demographic labeling
- Slim-CNN: efficient model for soft attribute classification
- CelebA and MAAD-Face as benchmark datasets
- Python, PyTorch, Hugging Face, OpenCV, dlib, NumPy
The synthetic-fairness-faces framework supports the full bias mitigation workflow with a sequence of automated steps:
-
Prompt checking and validation
- Ensures the
prompt.txtfile is present and non-empty
- Ensures the
-
Synthetic image generation
- Default: Stable Diffusion (
generate_images_with_selected_model.py) - Alternative: GAN-based methods (manual switch required)
- Default: Stable Diffusion (
-
Image resizing and quality filtering
- Resize to 512x512, 178x218, and 112x112
- Filter low-quality faces using MagFace magnitude
-
Feature extraction and evaluation
- Extract MagFace embeddings
- Discard samples with magnitude < 20
-
Attribute and demographic labeling
- Two options:
Facer + FairFace: label with Facer, validate with FairFaceFaceXFormer: single-step labeling via transformer model
- Two options:
-
Dataset partitioning and model training
- Partition into train/val/test (80/10/10)
- Train Slim-CNN on CelebA + synthetic data
-
Evaluation and fairness analysis
- Evaluate Slim-CNN on MAAD-Face benchmark
- Analyze performance across gender, age, and ethnicity
You can execute the entire process automatically by running pipeline_automatic.py.
The script performs all steps in sequence, including image generation, preprocessing, labeling, training, and fairness evaluation.
A detailed file tree is available here
Install all required dependencies using Conda:
conda env create -f environment.yml
conda activate envModel and dataset requirements are automatically checked at runtime by pipeline_automatic.py.
If any file is missing, the script will print the direct download links.
-
vae.ptfor image generation (Stable Diffusion VAE)
Download → place in:models/image_generation/ -
FaceXFormer
Download → place in:models/facexformer/ -
FairFace models → place all 3 files in:
models/FairFace/ -
MagFace model
Download → place in:models/magface/
-
CelebA dataset
Download
Extract to:img/CelebA/img_align_celeba/ -
VGGFace2 (test set only)
Download on Kaggle
Extract only theval/folder to:img/vggface2_test/test/ -
MAAD_Face.csv
Download
Place in:img/vggface2_test/
To execute the full workflow:
python pipeline_automatic.pyYou’ll be prompted to select the labeling method:
1→ Facer + FairFace (separate labeling + demographic validation)2→ FaceXFormer (joint attribute and demographic labeling)
By default, image generation is performed using Stable Diffusion (SD).
🌀 To use GAN-based generation instead, comment the SD script inside
pipeline_automatic.pyand enable the appropriate one in:image_generation/src/GAN/
All downstream modules (resizing, filtering, annotation, training, and evaluation) remain compatible regardless of the generator used.
-
🔧 Training logs and checkpoints:
pytorch-slim-cnn/checkpoints* pytorch-slim-cnn/plots/ -
📓 Evaluation Notebook (opens in Jupyter):
pytorch-slim-cnn/evaluate_both.ipynb -
📄 Fairness metrics and predictions:
fairness/*.csv
- Mattia Maucioni | 📧 mattiamaucioni [at] icloud.com | Data Science Field
- Marco Di Maio | 📧 marcodimaio2102 [at] gmail.com | Data Science Field
L. Cascone, M. Di Maio, V. Loia, M. Maucioni, M. Nappi, and C. Pero,
“Synthetic Data for Fairness: Bias Mitigation in Facial Attribute Recognition,”
IEEE Open Journal of the Computer Society, vol. 6, pp. 1703–1714, 2025.
doi: 10.1109/OJCS.2025.3622694
@ARTICLE{11206468,
author={Cascone, Lucia and Maio, Marco Di and Loia, Vincenzo and Maucioni, Mattia and Nappi, Michele and Pero, Chiara},
journal={IEEE Open Journal of the Computer Society},
title={Synthetic Data for Fairness: Bias Mitigation in Facial Attribute Recognition},
year={2025},
volume={6},
number={},
pages={1703-1714},
keywords={Face recognition;Facial features;Synthetic data;Computer architecture;Annotations;Prevention and mitigation;Accuracy;Training;Image color analysis;Hair;Facial attribute recognition;bias mitigation;demographic fairness;synthetic data;stable diffusion;generative adversarial networks},
doi={10.1109/OJCS.2025.3622694}}This project is licensed under the CC BY-NC-SA 4.0 License
You may share and adapt this work for non-commercial purposes only, as long as you give appropriate credit and distribute your contributions under the same license.
For commercial use, explicit permission from the authors is required.

