Home

UniFace: All-in-One Face Analysis Toolkit for Production

2025-11-11T12:00:00+09:00

UniFace is a lightweight, production-ready face analysis library built on ONNX Runtime. It provides a unified API for face detection, recognition, landmark detection, attribute analysis, face parsing, gaze estimation, anti-spoofing, and privacy features.

Documentation Moved

The comprehensive documentation for UniFace has been moved to a dedicated documentation site with tutorials, API references, and guides.

UniFace Docs Page →

Interactive Notebooks

Run examples directly in your browser with Google Colab:

Notebook	Colab	Description
Face Detection		Detect faces and 5-point landmarks
Face Alignment		Align faces for recognition
Face Verification		Compare faces for identity
Face Search		Find a person in group photos
Face Analyzer		All-in-one face analysis
Face Parsing		Semantic face segmentation
Face Anonymization		Privacy-preserving blur
Gaze Estimation		Gaze direction estimation

View all notebooks →

Quick Install

pip install uniface

For GPU support:

pip install uniface[gpu]

Resources:

Documentation: yakhyo.github.io/uniface
GitHub: github.com/yakhyo/uniface
PyPI: pypi.org/project/uniface

Face Parsing using BiSeNet for Real-time Semantic Segmentation

2024-11-29T12:00:00+09:00

BiSeNet (Bilateral Segmentation Network) is a state-of-the-art model for real-time semantic segmentation, initially proposed in the paper Bilateral Segmentation Network for Real-time Semantic Segmentation. The architecture addresses the fundamental challenge in semantic segmentation: achieving high accuracy while maintaining real-time performance.

Architecture Overview

BiSeNet combines two complementary paths to balance spatial detail and semantic context:

Spatial Path: Preserves high-resolution spatial information through a shallow network with wide channels, capturing fine-grained details essential for precise segmentation boundaries.
Context Path: Employs a lightweight backbone to aggregate rich contextual information with a large receptive field, enabling accurate semantic understanding.

The fusion of these paths through a Feature Fusion Module ensures high segmentation accuracy with low computational cost, making it ideal for applications requiring real-time performance on resource-constrained devices.

Key Features

Accurate Facial Parsing: Segments detailed facial features including eyes, nose, mouth, and hair for precise analysis
ONNX Support: Seamless conversion from PyTorch to ONNX format for cross-platform deployment
Flexible Backbones: Support for ResNet18 and ResNet34, allowing trade-offs between speed and accuracy
Production-Ready: Optimized for real-time applications in AR/VR, digital makeup, and facial analysis systems

Performance Comparison

ResNet34 Backbone

ResNet18 Backbone

Getting Started

Clone the repository and install dependencies:

git clone https://github.com/yakhyo/face-parsing.git
cd face-parsing
pip install -r requirements.txt

Pre-trained weights are available for download:

Use Cases

This implementation is particularly useful for:

Digital Makeup Applications: Precise facial feature segmentation for virtual makeup try-on
Face Swapping: Accurate face region extraction for deepfake and face replacement systems
Facial Analysis: Detailed feature extraction for emotion recognition and facial attribute analysis
AR/VR Applications: Real-time face parsing for augmented reality filters and effects

Visit the GitHub repository for detailed documentation, training scripts, and inference examples.

Tiny-Face: Ultra-lightweight Face Detection for Mobile and Edge Devices

2024-11-09T12:00:00+09:00

Tiny-Face is an ultra-lightweight face detection model specifically designed for deployment on mobile and edge devices where computational resources are limited. Unlike conventional face detection models that prioritize accuracy at the cost of model size and inference speed, Tiny-Face achieves an optimal balance between detection performance and computational efficiency.

Building upon the core concepts of RetinaFace, Tiny-Face introduces several key optimizations that make it practical for real-world deployment on mobile phones, embedded systems, and IoT devices. The model is streamlined to use minimal memory and processing power while maintaining high precision in face detection across various challenging conditions.

GitHub Repository

Key Features

Ultra-lightweight Architecture: Model sizes ranging from 1.4MB to 1.8MB, ideal for mobile deployment
Multiple Configurations: SlimFace, RFB, and MobileNet variants optimized for different resource constraints
Real-time Performance: Achieves real-time inference on mobile CPUs without GPU acceleration
Pretrained Models: Ready-to-use weights trained on WiderFace dataset for immediate deployment

Performance on WiderFace Dataset

Multi-scale Image Size

Models	Pretrained on ImageNet	Easy	Medium	Hard	#Params(M)	Size(MB)
SlimFace	False	79.50%	79.40%	68.36%	0.343	1.4
RFB	False	80.49%	81.51%	75.73%	0.359	1.5
RetinaFace	True	87.69%	86.39%	80.21%	0.426	1.8

Original Image Size

Models	Pretrained on ImageNet	Easy	Medium	Hard	#Params(M)
SlimFace	False	87.10%	84.36%	67.38%	0.343
RFB	False	87.09%	84.61%	69.22%	0.359
RetinaFace	True	90.26%	87.48%	72.85%	0.426

Technical Implementation

The model architecture incorporates several optimization techniques:

Depthwise Separable Convolutions: Reduces computational cost while maintaining representational power
Feature Pyramid Network: Multi-scale feature extraction for detecting faces of various sizes
Efficient Anchor Design: Optimized anchor boxes specifically tuned for face detection tasks
Quantization-Friendly: Architecture designed to maintain accuracy after INT8 quantization

Use Cases

Tiny-Face is particularly well-suited for:

Mobile Applications: Face detection in camera apps, social media filters, and photo editing tools
Edge Computing: Real-time face detection on IoT devices and smart cameras
Embedded Systems: Integration into resource-constrained hardware for access control and monitoring
Offline Applications: Face detection without requiring cloud connectivity or GPU acceleration

Explore the GitHub repository for detailed setup instructions, training scripts, and deployment examples for various platforms including Android, iOS, and embedded Linux systems.

RetinaFace: Single-stage Dense Face Localisation in the Wild

2024-10-28T12:00:00+09:00

RetinaFace is a robust single-stage face detection framework designed for dense face localisation in unconstrained environments. This implementation provides a production-ready solution with multiple backbone options, enabling flexible deployment across different hardware constraints and accuracy requirements.

The model excels at detecting faces across extreme variations in scale, pose, and occlusion, making it particularly effective for real-world applications where faces may appear at any size or orientation within the image.

UniFace Library: For easier integration, check out UniFace, a lightweight Python library built on models from this repository. UniFace provides a simple API for face detection, alignment, and landmark extraction.

GitHub Repository

Key Features

Multiple Backbone Options: Choose from MobileNetV1 (various width multipliers), MobileNetV2, ResNet18, and ResNet34
Improved Training Strategy: Enhanced filtering of small faces (< 16 pixels) to reduce false positives
ONNX Export Support: Seamless conversion for deployment on various platforms and inference engines
Real-time Inference: Optimized for webcam and video stream processing
Production-Ready: Clean, well-documented codebase with reproducible training pipelines

Performance on WiderFace Dataset

Multi-scale Image Size

RetinaFace Backbones	Pretrained on ImageNet	Easy	Medium	Hard
MobileNetV1 (width mult=0.25)	True	88.48%	87.02%	80.61%
MobileNetV1 (width mult=0.50)	False	89.42%	87.97%	82.40%
MobileNetV1	False	90.59%	89.14%	84.13%
MobileNetV2	True	91.70%	91.03%	86.60%
ResNet18	True	92.50%	91.02%	86.63%
ResNet34	True	94.16%	93.12%	88.90%

Original Image Size

RetinaFace Backbones	Pretrained on ImageNet	Easy	Medium	Hard
MobileNetV1 (width mult=0.25)	True	90.70%	88.12%	73.82%
MobileNetV1 (width mult=0.50)	False	91.56%	89.46%	76.56%
MobileNetV1	False	92.19%	90.41%	79.56%
MobileNetV2	True	94.04%	92.26%	83.59%
ResNet18	True	94.28%	92.69%	82.95%
ResNet34	True	95.07%	93.48%	84.40%

Architecture Highlights

RetinaFace incorporates several advanced techniques:

Multi-task Learning: Simultaneously performs face detection, landmark localization, and 3D face reconstruction
Feature Pyramid Network: Enables detection of faces at multiple scales efficiently
Context Module: Increases receptive field for better handling of small faces
Dense Regression: Pixel-wise prediction for precise face localization

Quick Start

Clone the repository:

git clone https://github.com/yakhyo/retinaface-pytorch.git
cd retinaface-pytorch

Install dependencies:

pip install -r requirements.txt

Run webcam inference:

python detect.py --network mobilenetv1 --weights retinaface_mv1.pth

Deployment Options

The implementation supports various deployment scenarios:

Python Inference: Direct PyTorch inference for development and testing
ONNX Runtime: Cross-platform deployment with optimized inference
Mobile Deployment: Lightweight MobileNet backbones for on-device inference
Server Deployment: High-accuracy ResNet backbones for cloud-based services

For detailed documentation on training custom models, fine-tuning on specific datasets, and deployment guides, visit the GitHub repository.

Understanding the Geometric Perspective of Vectors in Machine Learning

2024-09-19T12:00:00+09:00

While I have a strong mathematical background in calculus and linear algebra, including the ability to perform matrix operations and transformations by hand, I recently discovered a crucial gap in my understanding: the geometric interpretation of vectors and their operations.

This realization came as I began exploring geometric explanations of linear algebra concepts. What were previously abstract lists of numbers or array indices suddenly transformed into intuitive geometric objects—arrows pointing to specific locations in space, representing directions and magnitudes in a visually comprehensible way.

Why Geometric Intuition Matters

Understanding vectors geometrically fundamentally changes how we approach machine learning problems:

Feature Spaces: Each data point becomes a position in high-dimensional space, making concepts like similarity and distance intuitive
Transformations: Matrix operations become geometric transformations—rotations, scalings, and projections
Optimization: Gradient descent transforms from abstract calculus into following the steepest downhill path in a landscape
Embeddings: Word embeddings and latent representations become geometric relationships where similar concepts cluster together

This geometric perspective provides deeper insight into why certain machine learning algorithms work and how to debug them when they don’t.

Recommended Learning Resources

For anyone working in machine learning who hasn’t yet explored the geometric foundations, I highly recommend these courses:

Linear Algebra

Essence of Linear Algebra by 3Blue1Brown - Course Link An exceptional visual introduction to linear algebra concepts with stunning animations that build geometric intuition
Mathematics for Machine Learning: Linear Algebra - Coursera Specifically designed for ML practitioners, covering the essential linear algebra needed for understanding modern ML algorithms

Calculus and Optimization

Mathematics for Machine Learning: Multivariate Calculus - Coursera Focuses on the calculus concepts most relevant to machine learning, particularly gradient-based optimization

Dimensionality Reduction

Mathematics for Machine Learning: PCA - Coursera Explores Principal Component Analysis from both mathematical and geometric perspectives

Practical Impact

This geometric understanding has practical implications for machine learning work:

Better Model Design: Understanding how transformations affect data helps in designing better architectures
Debugging: Geometric intuition makes it easier to identify why models fail on certain data
Feature Engineering: Knowing how features interact geometrically guides better feature design
Interpretability: Geometric perspective aids in explaining model decisions to non-technical stakeholders

Investing time in building this geometric intuition is one of the most valuable things you can do as a machine learning practitioner. It transforms machine learning from a collection of algorithms and formulas into an intuitive, visual discipline where you can reason about what models are doing and why.

Real-Time Gaze Estimation Using Lightweight Deep Learning Models

2024-09-18T12:00:00+09:00

This project focuses on predicting gaze direction using lightweight deep learning models optimized for real-time performance on mobile devices. The implementation combines classification and regression techniques to create an efficient and accurate solution suitable for deployment on resource-constrained hardware.

GitHub Repository

Applications and Use Cases

Gaze estimation technology enables a wide range of applications across multiple domains:

Mobile User Experience: Hands-free navigation and attention-aware interfaces
Virtual and Augmented Reality: Natural interaction through eye tracking in VR/AR systems
Accessibility: Assistive technologies for users with limited mobility
Automotive Safety: Driver attention monitoring and drowsiness detection
Human-Computer Interaction: Intuitive control mechanisms for various devices
Market Research: Understanding user attention patterns and visual behavior

Model Architecture and Design

The project implements multiple lightweight architectures, each optimized for different deployment scenarios:

ResNet Variants

Employs residual learning techniques to enable deeper networks without degradation. The residual connections allow gradients to flow more effectively during training, resulting in better accuracy without significant computational overhead.

MobileNet v2

Specifically designed for mobile deployment, MobileNet v2 introduces inverted residual structures and linear bottlenecks. This architecture achieves an optimal balance between model size, inference speed, and accuracy, making it ideal for on-device gaze estimation.

MobileOne (s0-s4)

The MobileOne family represents the state-of-the-art in mobile-optimized architectures. With variants ranging from s0 to s4, it offers flexibility in trading off between speed and accuracy. The architecture is specifically optimized for mobile CPUs, achieving impressive real-time performance without GPU acceleration.

Face Detection Integration

The system integrates SCRFD (Sample and Computation Redistribution for Efficient Face Detection) for robust face localization. SCRFD provides:

Fast inference suitable for real-time applications
High accuracy across various face scales and poses
Efficient resource utilization for mobile deployment
Reliable performance in challenging lighting conditions

Technical Implementation

The gaze estimation pipeline consists of several stages:

Face Detection: SCRFD localizes faces in the input frame
Face Alignment: Detected faces are normalized to a standard pose
Eye Region Extraction: Precise localization of eye regions for gaze prediction
Gaze Prediction: Deep learning model estimates gaze direction as pitch and yaw angles
Temporal Smoothing: Optional filtering to reduce jitter in video streams

Performance Characteristics

The implementation achieves:

Real-time inference (30+ FPS) on modern mobile devices
Low latency suitable for interactive applications
Minimal battery impact through efficient computation
Robust performance across different lighting conditions and head poses

The complete implementation, including training scripts, pre-trained models, and deployment examples, is available on GitHub.

Real-Time Head Pose Estimation with Efficient Deep Learning Backbones

2024-09-17T12:00:00+09:00

This project delivers accurate real-time head pose estimation through optimized deep learning architectures. The implementation focuses on achieving high performance across various deployment scenarios, from mobile devices to desktop applications, while maintaining robust accuracy in challenging conditions.

GitHub Repository

Applications and Industry Impact

Head pose estimation plays a critical role in numerous applications:

Augmented and Virtual Reality: Natural user interaction through head tracking
Attention Monitoring: Understanding user focus in educational and workplace settings
Driver Safety Systems: Detecting driver distraction and drowsiness in automotive applications
Human-Computer Interaction: Enabling intuitive control mechanisms
Video Conferencing: Automatic camera adjustment and gaze correction
Accessibility Technologies: Assistive systems for users with limited mobility

Technical Architecture

The system integrates multiple components to achieve robust performance:

Backbone Networks

ResNet Variants

ResNet architectures provide the foundation for high-accuracy head pose estimation. The residual connections enable training of deeper networks, capturing complex patterns in head orientation across various poses and lighting conditions.

MobileNet v2 and v3

MobileNet architectures are specifically optimized for mobile deployment:

Inverted residual structures reduce computational requirements
Depthwise separable convolutions minimize parameter count
Hardware-aware network design ensures efficient inference on mobile processors
Maintains accuracy while achieving real-time performance on edge devices

Face Detection Pipeline

The system incorporates SCRFD (Sample and Computation Redistribution for Efficient Face Detection) for robust face localization:

High-speed detection suitable for real-time video processing
Accurate localization across various scales and orientations
Efficient resource utilization for mobile deployment
Reliable performance in challenging environmental conditions

Implementation Details

The head pose estimation pipeline consists of:

Face Detection: SCRFD localizes faces in the input frame with high precision
Face Preprocessing: Detected faces are normalized and aligned to a standard format
Pose Estimation: Deep learning model predicts Euler angles (pitch, yaw, roll)
Temporal Filtering: Optional smoothing to reduce jitter in video streams
Visualization: Real-time rendering of estimated head orientation

Performance Characteristics

The implementation achieves:

Real-time inference (30+ FPS) on modern mobile devices
Sub-100ms latency suitable for interactive applications
Robust accuracy across diverse demographics and lighting conditions
Efficient memory usage enabling deployment on resource-constrained devices

Model Selection Guide

Choose the appropriate backbone based on your deployment requirements:

ResNet50: Highest accuracy, suitable for server-side deployment or powerful edge devices
ResNet34: Balanced accuracy and speed for desktop applications
MobileNet v3: Optimal for mobile devices requiring real-time performance
MobileNet v2: Legacy mobile support with proven reliability

The complete implementation, including training scripts, pre-trained weights, and deployment examples for various platforms, is available on GitHub.