Thesis-PEFT: Parameter-Efficient Domain Adaptation for Entity Matching

This repository is part of my master's thesis on Entity Matching (EM). In my project, a parameter-efficient approach - Low-Rank Adaptation (LoRA) - is used to adapt pre-trained transformers for EM tasks. By updating only a small subset of parameters, this method achieves strong performance on unlabeled target datasets by aligning feature distributions between labeled source and unlabeled target datasets.

Project Overview

🔍 Problem: Entity Matching (EM) requires labeled training data, which is expensive to obtain. This project explores domain adaptation to transfer knowledge from labeled source datasets to unlabeled target datasets.
💡 Solution: Uses LoRA (Low-Rank Adaptation) for efficient fine-tuning and MK-MMD (Multiple Kernel Maximum Mean Discrepancy) for domain alignment.
✨ Key Features:
- ✅ Efficient fine-tuning with LoRA (only 1% additional parameters).
- ✅ Domain adaptation to align source and target feature distributions.
- ✅ Task learning and domain adaptation trade-off, controlled by a dynamic α function.

📂 Project Structure

│── config/                # YAML config files for training & hyperparameter tuning
│── dataset/               # Data processing and dataloaders
│── src/
│   ├── models/            # LoRA and Fine-Tuning models
│   ├── divergences/       # Domain adaptation losses (MK-MMD, Gaussian Kernels)
│   ├── training/          # Training scripts
│   ├── evaluation/        # Evaluation metrics and scripts
│── scripts/
│   ├── train.sh           # Training script
│   ├── run_experiments.sh # Batch execution for different datasets
│── Dockerfile             # Docker setup for containerized execution
│── requirements.txt       # Required Python dependencies
│── setup.sh               # Setup script for dependencies (CUDA support included)
│── README.md              # Project documentation

The script:

Loops through source and target dataset experimented in the paper, using 3 random seeds
Use the --hparam_tuning flag to enable hyperparameter tuning.

🏗️ Setup & Installation

1️⃣ Using Docker (Recommended)

This project is containerized using Docker for consistency and reproducibility.
To set up and run the model inside a Docker container:

🔹 Build the Docker Image

docker build -t thesis-peft .

🔹 Run Training Inside a Container

docker run --gpus all --rm -v $(pwd):/app thesis-peft bash train.sh --src <source_dataset> --tgt <target_dataset> --model lora

🔹 Run Hyperparameter Tuning

docker run --gpus all --rm -v $(pwd):/app thesis-peft bash train.sh --src <source_dataset> --tgt <target_dataset> --model lora --hparam-tuning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Thesis-PEFT: Parameter-Efficient Domain Adaptation for Entity Matching

Project Overview

📂 Project Structure

🏗️ Setup & Installation

1️⃣ Using Docker (Recommended)

🔹 Build the Docker Image

🔹 Run Training Inside a Container

🔹 Run Hyperparameter Tuning

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 128 Commits
config		config
data		data
dataset		dataset
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements-cuda.txt		requirements-cuda.txt
requirements.txt		requirements.txt
setup.sh		setup.sh
train.sh		train.sh

Folders and files

Latest commit

History

Repository files navigation

Thesis-PEFT: Parameter-Efficient Domain Adaptation for Entity Matching

Project Overview

📂 Project Structure

🏗️ Setup & Installation

1️⃣ Using Docker (Recommended)

🔹 Build the Docker Image

🔹 Run Training Inside a Container

🔹 Run Hyperparameter Tuning

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages