This repository implements an unsupervised Deepfake Detection System using a Denoising Autoencoder (DAE) architecture augmented with a ResNet18 perceptual loss backbone.
The model is designed to learn the robust distribution of real faces. During inference, it identifies deepfake videos by detecting high reconstruction errors and semantic feature mismatches, as the model fails to accurately reconstruct unseen manipulation artifacts.
- Denoising Autoencoder (DAE): Trained to reconstruct clean faces from gaussian-noised inputs, forcing the model to learn meaningful facial features rather than memorizing pixels.
- Perceptual Feature Scoring: Integrated a frozen ResNet18 (ImageNet pre-trained) to calculate feature distances at multiple layers (Layer 1, 2, and 3), capturing high-level semantic discrepancies.
- Hybrid Anomaly Metric: Uses a weighted ensemble of three distinct error metrics for robust detection:
- Global L1 Error: Pixel-wise absolute difference.
- Patch-based Error: Localized error pooling to detect specific artifacts (e.g., eye/mouth glitches).
- Feature Distance: Semantic difference in the ResNet latent space.
- Clean & Modular Code: Implemented with PyTorch best practices, including custom DataLoaders and modular model definitions.
├── data/ # Dataset directory
│ ├── train/ # Training data (Real faces only)
│ └── val/ # Validation data (Real & Fake for testing)
├── models/ # Neural network modules (not separated in single script version)
├── best_model_multiresnet.pth # Saved model checkpoint
├── train.py # Main training and evaluation script
└── README.md # Project documentation
The model is trained only on real face images.
- Input: Real Face + Gaussian Noise ()
- Encoder: 4-Layer CNN Latent Vector ()
- Decoder: Latent Vector Reconstructed Face
- Loss Function: L1 Loss between Reconstructed Image and Original Clean Image.
For every frame in the test set, an Anomaly Score is calculated using a weighted formula derived from experimentation:
- If the input is a Deepfake, the model (which only knows real faces) will fail to reconstruct the manipulation artifacts, resulting in a high score.
- If the input is Real, the reconstruction will be accurate, resulting in a low score.
- Clone the repository:
git clone [https://github.com/actuallysena/deepfake-anomaly-detection.git](https://github.com/actuallysena/deepfake-anomaly-detection.git)
cd deepfake-anomaly-detection
- Install dependencies:
pip install torch torchvision numpy scikit-learn tqdm pillow
Organize your dataset as follows:
- Training: Contains only real face images.
- Validation: Contains both real and deepfake images for AUC evaluation.
data/
train/
real/ # ~5000+ real face frames
val/
real/ # Real faces for testing
fake/ # Deepfake faces for testing
Run the training script. The script automatically handles device selection (CUDA/CPU) and saves the best model based on ROC-AUC score.
python train.py
- Hyperparameters:
- Image Size: 128x128
- Latent Dimension: 32
- Batch Size: 128
- Epochs: 500
The system evaluates performance using the ROC-AUC (Area Under the Curve) metric.
- Best AUC Achieved:
0.89(Update with your result) - The hybrid scoring mechanism demonstrated higher accuracy compared to pixel-only loss, specifically in detecting subtle artifacts in high-quality deepfakes.
This project was developed for research purposes to explore unsupervised anomaly detection methods in digital forensics.
