A comprehensive framework for comparing transfer learning methods in domain adaptation tasks. Implements NN, PCA, TCA, GFK, TSL, and JDA algorithms with proper implementations based on the original papers.
This framework reproduces experiments from the JDA (Joint Distribution Adaptation) paper and provides a unified interface for comparing different transfer learning methods.
Transfer learning is established as an effective technology in computer vision for leveraging rich labeled data in the source domain to build an accurate classifier for the target domain. However, most prior methods have not simultaneously reduced the difference in both the marginal distribution and conditional distribution between domains. In this paper, we put forward a novel transfer learning approach, referred to as Joint Distribution Adaptation (JDA). Specifically, JDA aims to jointly adapt both the marginal distribution and conditional distribution in a principled dimensionality reduction procedure, and construct new feature representation that is effective and robust for substantial distribution difference. Extensive experiments verify that JDA can significantly outperform several state-of-the-art methods on four types of cross-domain image classification problems.
| Method | Full Name | Reference |
|---|---|---|
| NN | Nearest Neighbor | Baseline |
| PCA | Principal Component Analysis | Baseline |
| TCA | Transfer Component Analysis | Pan et al., TNN 2011 |
| GFK | Geodesic Flow Kernel | Gong et al., CVPR 2012 |
| TSL | Transfer Subspace Learning | Si et al., TKDE 2010 |
| JDA | Joint Distribution Adaptation | Long et al., ICCV 2013 |
jda_project/
├── jda_comparison.py # Main comparison framework
├── tune_parameters.py # Parameter tuning and optimization
├── run_experiments.py # Batch experiment runner
├── fig4_final.py # Figure 4 generation
├── README.md # This file
├── .gitignore # Git ignore rules
├── Method Introduction.png # JDA method overview diagram
├── data/ # Dataset files
│ ├── digit/
│ │ └── MNIST_vs_USPS.mat
│ ├── coil/
│ │ ├── COIL_1.mat
│ │ └── COIL_2.mat
│ ├── pie/
│ │ ├── PIE1.mat
│ │ ├── PIE2.mat
│ │ ├── PIE3.mat
│ │ ├── PIE4.mat
│ │ └── PIE5.mat
│ ├── surf/
│ │ ├── amazon_zscore_SURF_L10.mat
│ │ ├── Caltech10_zscore_SURF_L10.mat
│ │ ├── dslr_zscore_SURF_L10.mat
│ │ └── webcam_zscore_SURF_L10.mat
│ ├── prepared_mnist_usps/ # Preprocessed MNIST-USPS data
│ └── prepared_pie/ # Preprocessed PIE data
└── paper_experiments/ # Paper experiment results and plots
# Clone the repository
git clone https://github.com/caroline1677/COMP7404-Group16-JDA-Reproduction.git
cd COMP7404-Group16-JDA-Reproduction
# Install dependencies with UV
uv sync
# Or install in development mode
uv pip install -e .pip install numpy scipy scikit-learnCheck out our interactive visualization of JDA results: https://visualization3.vercel.app/
Verify your installation by running a simple test:
python jda_comparison.py --dataset digit --src USPS --tar MNISTExpected output:
============================================================
Transfer Learning Comparison: USPS -> MNIST
Dataset: digit, Dim: 100, Lambda: 0.1, JDA Iter: 10, TSL Iter: 10
============================================================
| Method | Accuracy | Runtime (s) |
|--------|----------|-------------|
| NN | 64.44% | 0.123 |
| PCA | 65.06% | 0.234 |
| GFK | 31.89% | 0.456 |
| TCA | 58.11% | 0.789 |
| TSL | 58.94% | 1.234 |
| JDA | 72.44% | 2.345 |
Place your data files in the following structure:
jda_project/
├── data/
│ ├── digit/
│ │ └── MNIST_vs_USPS.mat
│ ├── coil/
│ │ ├── COIL_1.mat
│ │ └── COIL_2.mat
│ ├── pie/
│ │ ├── PIE1.mat
│ │ ├── PIE2.mat
│ │ ├── PIE3.mat
│ │ ├── PIE4.mat
│ │ └── PIE5.mat
│ └── surf/
│ ├── amazon_zscore_SURF_L10.mat
│ ├── Caltech10_zscore_SURF_L10.mat
│ ├── dslr_zscore_SURF_L10.mat
│ └── webcam_zscore_SURF_L10.mat
Run a single experiment using preset datasets:
python jda_comparison.py --dataset digit --src USPS --tar MNIST
python jda_comparison.py --dataset coil --src COIL1 --tar COIL2
python jda_comparison.py --dataset pie --src PIE1 --tar PIE4
python jda_comparison.py --dataset surf --src webcam --tar dslrUse your own .mat files by specifying file paths and variable names:
python jda_comparison.py \
--src-file data/source.mat --src-feat X --src-label y \
--tar-file data/target.mat --tar-feat X --tar-label y \
--dim 100 --lamb 0.1| Argument | Type | Default | Description |
|---|---|---|---|
| Data Input (choose one mode) | |||
--dataset |
str | None | Dataset type: digit, coil, pie, surf |
--src |
str | None | Source domain name (with --dataset) |
--tar |
str | None | Target domain name (with --dataset) |
--data-dir |
str | data | Path to data directory |
--src-file |
str | None | Path to source .mat file (custom mode) |
--src-feat |
str | None | Variable name for source features |
--src-label |
str | None | Variable name for source labels |
--tar-file |
str | None | Path to target .mat file (custom mode) |
--tar-feat |
str | None | Variable name for target features |
--tar-label |
str | None | Variable name for target labels |
| Method Parameters | |||
--dim |
int | 100 | Subspace dimensionality |
--lamb |
float | 0.1 | Regularization (TCA/TSL/JDA) |
--iter |
int | 10 | Default iterations for JDA and TSL |
--jda-iter |
int | 10 | Iterations for JDA specifically |
--tsl-iter |
int | 10 | Iterations for TSL specifically |
| Method-Specific Parameters | |||
--pca-dim |
int | dim | Dimensionality for PCA |
--gfk-dim |
int | dim | Dimensionality for GFK |
--tca-dim |
int | dim | Dimensionality for TCA |
--tca-lamb |
float | lamb | Regularization for TCA |
--tsl-dim |
int | dim | Dimensionality for TSL |
--tsl-lamb |
float | lamb | Regularization for TSL |
--jda-dim |
int | dim | Dimensionality for JDA |
--jda-lamb |
float | lamb | Regularization for JDA |
| Output Options | |||
--methods |
str | all | Methods: 'all' or comma-separated (nn,pca,tca,gfk,tsl,jda) |
--parallel |
flag | False | Run methods in parallel (multi-threaded) |
--workers |
int | 4 | Number of parallel workers (default: 4) |
--output |
str | None | Save results to CSV file |
Run with custom parameters:
python jda_comparison.py --dataset surf --src webcam --tar dslr --dim 50 --lamb 1.0 --jda-iter 15Run with method-specific parameters:
python jda_comparison.py --dataset digit --src USPS --tar MNIST \
--methods pca,gfk,tca,tsl,jda \
--pca-dim 40 \
--gfk-dim 60 \
--tca-dim 50 --tca-lamb 0.1 \
--tsl-dim 80 --tsl-lamb 1.0 \
--jda-dim 100 --jda-lamb 0.1Run only specific methods:
python jda_comparison.py --dataset coil --src COIL1 --tar COIL2 --methods nn,pca,jdaRun in parallel (recommended for large datasets):
python jda_comparison.py --dataset pie --src PIE1 --tar PIE4 --parallel --workers 4Save results to CSV:
python jda_comparison.py --dataset pie --src PIE1 --tar PIE4 --output results.csvRun multiple experiments from a config file:
python run_experiments.py experiments_config.csv full_results.csvdataset,src,tar,dim,lamb,iter
digit,USPS,MNIST,100,0.1,10
coil,COIL1,COIL2,100,0.1,10
pie,PIE1,PIE4,100,0.1,10
surf,webcam,dslr,100,1.0,10src_file,src_feat,src_label,tar_file,tar_feat,tar_label,dim,lamb,iter,jda_iter,tsl_iter
data/source1.mat,X,Y,data/target1.mat,X,Y,100,0.1,10,10,10
data/source2.mat,features,labels,data/target2.mat,features,labels,100,0.1,15,15,5Results are saved in CSV format with accuracy and runtime for each method:
Task,NN_Acc,NN_Time,PCA_Acc,PCA_Time,GFK_Acc,GFK_Time,TCA_Acc,TCA_Time,TSL_Acc,TSL_Time,JDA_Acc,JDA_Time
USPS -> MNIST,64.44,0.123,65.06,0.234,31.89,0.456,58.11,0.789,58.94,1.234,72.44,2.345Perform grid search to find optimal hyperparameters for each method:
# Tune all methods (sequential)
python tune_parameters.py --dataset digit --src USPS --tar MNIST
# Tune specific methods
python tune_parameters.py --dataset digit --src USPS --tar MNIST --methods pca,gfk,tca
# Compare with original paper results (find parameters closest to paper accuracy)
python tune_parameters.py --dataset digit --src USPS --tar MNIST --compare-paper
# Run with parallel parameter search (faster for large parameter grid)
# Methods run sequentially (one at a time) to avoid mixed output
# Internal parameter search is parallelized using multiple workers
python tune_parameters.py --dataset digit --src USPS --tar MNIST --parallel --workers 4
# PIE dataset with 8 workers
python tune_parameters.py --dataset pie --src PIE2 --tar PIE5 --parallel --workers 8- Sequential (default): Each method runs one after another. Inside each method, parameters are tested one by one.
- Parallel (
--parallel --workers N): Each method still runs one after another (to keep output clean), but inside each method, the parameter grid search is parallelized using N threads.
Example with --workers 4:
- PCA: Tests k=10,20,30,...,150 in parallel (4 at a time)
- Then GFK: Tests k=10,20,30,...,150 in parallel (4 at a time)
- Then TCA: Tests 15*3=45 combinations in parallel (4 at a time)
- And so on...
| Method | k Range | λ Values |
|---|---|---|
| PCA | 10,20,30,...,200 | - |
| GFK | 10,20,30,...,200 | - |
| TCA | 10,20,30,...,200 | 0.01, 0.1, 1.0 |
| TSL | 10,20,30,...,150 | 0.01, 0.1, 1.0 |
| JDA | 10,20,30,...,150 | 0.01, 0.1, 1.0 |
Note:Time shown is average per experiment.
Baseline classifier using 1-NN with Euclidean distance.
Dimensionality reduction using PCA followed by 1-NN classification.
- Adapts only the marginal distribution P(x)
- Uses MMD (Maximum Mean Discrepancy) to measure distribution distance
- Learns transfer components in RKHS
- Manifold-based domain adaptation
- Computes geodesic flow between source and target subspaces
- Uses kernel distance for classification: D = diag(K_ss) + diag(K_tt) - 2*K_st
- Uses Bregman divergence (LogDet) for distribution adaptation
- Iteratively optimizes subspace to minimize divergence
- Maximizes variance while minimizing distribution mismatch
- Uses
--tsl-iterfor internal optimization iterations
- Adapts both marginal P(x) and conditional Q(y|x) distributions
- Iteratively refines target pseudo-labels
- Combines MMD for both marginal and conditional distributions
- Uses
--jda-iterfor pseudo-label refinement iterations
- Raw pixel values (no normalization)
- Feature dimension varies by dataset
- Raw image features
- 20 object classes, 72 images per object
- Normalized to [0,1] by dividing by 255
- Face images from different poses
- Pre-extracted SURF features
- Already z-score standardized
| Dataset | Lambda | Notes |
|---|---|---|
| digit (USPS->MNIST) | 0.1 | Raw pixels work best |
| coil (COIL1->COIL2) | 0.1 | Raw pixels work best |
| pie (PIE->PIE) | 0.1 | Normalize to [0,1] |
| surf (Office) | 1.0 | SURF already standardized |
# Install dev dependencies
uv sync --dev
# Run tests
pytest
# Run with coverage
pytest --cov=. --cov-report=htmlThis project follows PEP 8 guidelines:
# Format code
black .
# Lint code
flake8 .This project is for educational and research purposes.
