JDA Comparison Framework

A comprehensive framework for comparing transfer learning methods in domain adaptation tasks. Implements NN, PCA, TCA, GFK, TSL, and JDA algorithms with proper implementations based on the original papers.

Overview

This framework reproduces experiments from the JDA (Joint Distribution Adaptation) paper and provides a unified interface for comparing different transfer learning methods.

Paper Abstract

Transfer learning is established as an effective technology in computer vision for leveraging rich labeled data in the source domain to build an accurate classifier for the target domain. However, most prior methods have not simultaneously reduced the difference in both the marginal distribution and conditional distribution between domains. In this paper, we put forward a novel transfer learning approach, referred to as Joint Distribution Adaptation (JDA). Specifically, JDA aims to jointly adapt both the marginal distribution and conditional distribution in a principled dimensionality reduction procedure, and construct new feature representation that is effective and robust for substantial distribution difference. Extensive experiments verify that JDA can significantly outperform several state-of-the-art methods on four types of cross-domain image classification problems.

Method Overview

Implemented Methods

Method	Full Name	Reference
NN	Nearest Neighbor	Baseline
PCA	Principal Component Analysis	Baseline
TCA	Transfer Component Analysis	Pan et al., TNN 2011
GFK	Geodesic Flow Kernel	Gong et al., CVPR 2012
TSL	Transfer Subspace Learning	Si et al., TKDE 2010
JDA	Joint Distribution Adaptation	Long et al., ICCV 2013

Project Structure

jda_project/
├── jda_comparison.py         # Main comparison framework
├── tune_parameters.py        # Parameter tuning and optimization
├── run_experiments.py        # Batch experiment runner
├── fig4_final.py             # Figure 4 generation
├── README.md                 # This file
├── .gitignore                # Git ignore rules
├── Method Introduction.png   # JDA method overview diagram
├── data/                     # Dataset files
│   ├── digit/
│   │   └── MNIST_vs_USPS.mat
│   ├── coil/
│   │   ├── COIL_1.mat
│   │   └── COIL_2.mat
│   ├── pie/
│   │   ├── PIE1.mat
│   │   ├── PIE2.mat
│   │   ├── PIE3.mat
│   │   ├── PIE4.mat
│   │   └── PIE5.mat
│   ├── surf/
│   │   ├── amazon_zscore_SURF_L10.mat
│   │   ├── Caltech10_zscore_SURF_L10.mat
│   │   ├── dslr_zscore_SURF_L10.mat
│   │   └── webcam_zscore_SURF_L10.mat
│   ├── prepared_mnist_usps/  # Preprocessed MNIST-USPS data
│   └── prepared_pie/        # Preprocessed PIE data
└── paper_experiments/        # Paper experiment results and plots

Installation

Using UV (Recommended)

# Clone the repository
git clone https://github.com/caroline1677/COMP7404-Group16-JDA-Reproduction.git
cd COMP7404-Group16-JDA-Reproduction

# Install dependencies with UV
uv sync

# Or install in development mode
uv pip install -e .

Using pip

pip install numpy scipy scikit-learn

Visualization Website

Check out our interactive visualization of JDA results: https://visualization3.vercel.app/

Quick Test

Verify your installation by running a simple test:

python jda_comparison.py --dataset digit --src USPS --tar MNIST

Expected output:

============================================================
Transfer Learning Comparison: USPS -> MNIST
Dataset: digit, Dim: 100, Lambda: 0.1, JDA Iter: 10, TSL Iter: 10
============================================================
| Method | Accuracy | Runtime (s) |
|--------|----------|-------------|
| NN     |   64.44% |       0.123 |
| PCA    |   65.06% |       0.234 |
| GFK    |   31.89% |       0.456 |
| TCA    |   58.11% |       0.789 |
| TSL    |   58.94% |       1.234 |
| JDA    |   72.44% |       2.345 |

Dataset Structure

Place your data files in the following structure:

jda_project/
├── data/
│   ├── digit/
│   │   └── MNIST_vs_USPS.mat
│   ├── coil/
│   │   ├── COIL_1.mat
│   │   └── COIL_2.mat
│   ├── pie/
│   │   ├── PIE1.mat
│   │   ├── PIE2.mat
│   │   ├── PIE3.mat
│   │   ├── PIE4.mat
│   │   └── PIE5.mat
│   └── surf/
│       ├── amazon_zscore_SURF_L10.mat
│       ├── Caltech10_zscore_SURF_L10.mat
│       ├── dslr_zscore_SURF_L10.mat
│       └── webcam_zscore_SURF_L10.mat

Usage

Preset Dataset Mode (Recommended for Standard Datasets)

Run a single experiment using preset datasets:

python jda_comparison.py --dataset digit --src USPS --tar MNIST
python jda_comparison.py --dataset coil --src COIL1 --tar COIL2
python jda_comparison.py --dataset pie --src PIE1 --tar PIE4
python jda_comparison.py --dataset surf --src webcam --tar dslr

Custom Data Mode (For Your Own Data)

Use your own .mat files by specifying file paths and variable names:

python jda_comparison.py \
    --src-file data/source.mat --src-feat X --src-label y \
    --tar-file data/target.mat --tar-feat X --tar-label y \
    --dim 100 --lamb 0.1

Command Line Options

Argument	Type	Default	Description
Data Input (choose one mode)
`--dataset`	str	None	Dataset type: digit, coil, pie, surf
`--src`	str	None	Source domain name (with --dataset)
`--tar`	str	None	Target domain name (with --dataset)
`--data-dir`	str	data	Path to data directory
`--src-file`	str	None	Path to source .mat file (custom mode)
`--src-feat`	str	None	Variable name for source features
`--src-label`	str	None	Variable name for source labels
`--tar-file`	str	None	Path to target .mat file (custom mode)
`--tar-feat`	str	None	Variable name for target features
`--tar-label`	str	None	Variable name for target labels
Method Parameters
`--dim`	int	100	Subspace dimensionality
`--lamb`	float	0.1	Regularization (TCA/TSL/JDA)
`--iter`	int	10	Default iterations for JDA and TSL
`--jda-iter`	int	10	Iterations for JDA specifically
`--tsl-iter`	int	10	Iterations for TSL specifically
Method-Specific Parameters
`--pca-dim`	int	dim	Dimensionality for PCA
`--gfk-dim`	int	dim	Dimensionality for GFK
`--tca-dim`	int	dim	Dimensionality for TCA
`--tca-lamb`	float	lamb	Regularization for TCA
`--tsl-dim`	int	dim	Dimensionality for TSL
`--tsl-lamb`	float	lamb	Regularization for TSL
`--jda-dim`	int	dim	Dimensionality for JDA
`--jda-lamb`	float	lamb	Regularization for JDA
Output Options
`--methods`	str	all	Methods: 'all' or comma-separated (nn,pca,tca,gfk,tsl,jda)
`--parallel`	flag	False	Run methods in parallel (multi-threaded)
`--workers`	int	4	Number of parallel workers (default: 4)
`--output`	str	None	Save results to CSV file

Examples

Run with custom parameters:

python jda_comparison.py --dataset surf --src webcam --tar dslr --dim 50 --lamb 1.0 --jda-iter 15

Run with method-specific parameters:

python jda_comparison.py --dataset digit --src USPS --tar MNIST \
    --methods pca,gfk,tca,tsl,jda \
    --pca-dim 40 \
    --gfk-dim 60 \
    --tca-dim 50 --tca-lamb 0.1 \
    --tsl-dim 80 --tsl-lamb 1.0 \
    --jda-dim 100 --jda-lamb 0.1

Run only specific methods:

python jda_comparison.py --dataset coil --src COIL1 --tar COIL2 --methods nn,pca,jda

Run in parallel (recommended for large datasets):

python jda_comparison.py --dataset pie --src PIE1 --tar PIE4 --parallel --workers 4

Save results to CSV:

python jda_comparison.py --dataset pie --src PIE1 --tar PIE4 --output results.csv

Batch Processing

Run multiple experiments from a config file:

python run_experiments.py experiments_config.csv full_results.csv

Configuration File Format

Preset Mode

dataset,src,tar,dim,lamb,iter
digit,USPS,MNIST,100,0.1,10
coil,COIL1,COIL2,100,0.1,10
pie,PIE1,PIE4,100,0.1,10
surf,webcam,dslr,100,1.0,10

Custom Mode

src_file,src_feat,src_label,tar_file,tar_feat,tar_label,dim,lamb,iter,jda_iter,tsl_iter
data/source1.mat,X,Y,data/target1.mat,X,Y,100,0.1,10,10,10
data/source2.mat,features,labels,data/target2.mat,features,labels,100,0.1,15,15,5

Output Format

Results are saved in CSV format with accuracy and runtime for each method:

Task,NN_Acc,NN_Time,PCA_Acc,PCA_Time,GFK_Acc,GFK_Time,TCA_Acc,TCA_Time,TSL_Acc,TSL_Time,JDA_Acc,JDA_Time
USPS -> MNIST,64.44,0.123,65.06,0.234,31.89,0.456,58.11,0.789,58.94,1.234,72.44,2.345

Parameter Tuning

Perform grid search to find optimal hyperparameters for each method:

# Tune all methods (sequential)
python tune_parameters.py --dataset digit --src USPS --tar MNIST

# Tune specific methods
python tune_parameters.py --dataset digit --src USPS --tar MNIST --methods pca,gfk,tca

# Compare with original paper results (find parameters closest to paper accuracy)
python tune_parameters.py --dataset digit --src USPS --tar MNIST --compare-paper

# Run with parallel parameter search (faster for large parameter grid)
# Methods run sequentially (one at a time) to avoid mixed output
# Internal parameter search is parallelized using multiple workers
python tune_parameters.py --dataset digit --src USPS --tar MNIST --parallel --workers 4

# PIE dataset with 8 workers
python tune_parameters.py --dataset pie --src PIE2 --tar PIE5 --parallel --workers 8

How Parallel Execution Works

Sequential (default): Each method runs one after another. Inside each method, parameters are tested one by one.
Parallel (--parallel --workers N): Each method still runs one after another (to keep output clean), but inside each method, the parameter grid search is parallelized using N threads.

Example with --workers 4:

PCA: Tests k=10,20,30,...,150 in parallel (4 at a time)
Then GFK: Tests k=10,20,30,...,150 in parallel (4 at a time)
Then TCA: Tests 15*3=45 combinations in parallel (4 at a time)
And so on...

Search Space

Method	k Range	λ Values
PCA	10,20,30,...,200	-
GFK	10,20,30,...,200	-
TCA	10,20,30,...,200	0.01, 0.1, 1.0
TSL	10,20,30,...,150	0.01, 0.1, 1.0
JDA	10,20,30,...,150	0.01, 0.1, 1.0

Note:Time shown is average per experiment.

Method Details

1. Nearest Neighbor (NN)

Baseline classifier using 1-NN with Euclidean distance.

2. Principal Component Analysis (PCA)

Dimensionality reduction using PCA followed by 1-NN classification.

3. Transfer Component Analysis (TCA)

Adapts only the marginal distribution P(x)
Uses MMD (Maximum Mean Discrepancy) to measure distribution distance
Learns transfer components in RKHS

4. Geodesic Flow Kernel (GFK)

Manifold-based domain adaptation
Computes geodesic flow between source and target subspaces
Uses kernel distance for classification: D = diag(K_ss) + diag(K_tt) - 2*K_st

5. Transfer Subspace Learning (TSL)

Uses Bregman divergence (LogDet) for distribution adaptation
Iteratively optimizes subspace to minimize divergence
Maximizes variance while minimizing distribution mismatch
Uses --tsl-iter for internal optimization iterations

6. Joint Distribution Adaptation (JDA)

Adapts both marginal P(x) and conditional Q(y|x) distributions
Iteratively refines target pseudo-labels
Combines MMD for both marginal and conditional distributions
Uses --jda-iter for pseudo-label refinement iterations

Data Preprocessing

Digit Datasets (USPS, MNIST)

Raw pixel values (no normalization)
Feature dimension varies by dataset

COIL Dataset

Raw image features
20 object classes, 72 images per object

PIE Dataset

Normalized to [0,1] by dividing by 255
Face images from different poses

Office SURF Dataset

Pre-extracted SURF features
Already z-score standardized

Hyperparameters

Dataset	Lambda	Notes
digit (USPS->MNIST)	0.1	Raw pixels work best
coil (COIL1->COIL2)	0.1	Raw pixels work best
pie (PIE->PIE)	0.1	Normalize to [0,1]
surf (Office)	1.0	SURF already standardized

Development

Running Tests

# Install dev dependencies
uv sync --dev

# Run tests
pytest

# Run with coverage
pytest --cov=. --cov-report=html

Code Style

This project follows PEP 8 guidelines:

# Format code
black .

# Lint code
flake8 .

License

This project is for educational and research purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
jda-results-visualization		jda-results-visualization
paper_experiments		paper_experiments
.gitattributes		.gitattributes
.gitignore		.gitignore
Method Introduction.png		Method Introduction.png
README.md		README.md
experiments_config.csv		experiments_config.csv
experiments_config.csv.example		experiments_config.csv.example
fig4_final.py		fig4_final.py
jda_comparison.py		jda_comparison.py
pyproject.toml		pyproject.toml
run_experiments.py		run_experiments.py
tune_parameters.py		tune_parameters.py

Folders and files

Latest commit

History

Repository files navigation

JDA Comparison Framework

Overview

Paper Abstract

Method Overview

Implemented Methods

Project Structure

Installation

Using UV (Recommended)

Using pip

Visualization Website

Quick Test

Dataset Structure

Usage

Preset Dataset Mode (Recommended for Standard Datasets)

Custom Data Mode (For Your Own Data)

Command Line Options

Examples

Batch Processing

Configuration File Format

Preset Mode

Custom Mode

Output Format

Parameter Tuning

How Parallel Execution Works

Search Space

Method Details

1. Nearest Neighbor (NN)

2. Principal Component Analysis (PCA)

3. Transfer Component Analysis (TCA)

4. Geodesic Flow Kernel (GFK)

5. Transfer Subspace Learning (TSL)

6. Joint Distribution Adaptation (JDA)

Data Preprocessing

Digit Datasets (USPS, MNIST)

COIL Dataset

PIE Dataset

Office SURF Dataset

Hyperparameters

Development

Running Tests

Code Style

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages