Leveraging Foundation Vision Models with QLoRA and DoRA for Dairy Cattle Behavior Recognition
This repository contains the official implementation of our paper exploring Parameter-Efficient Fine-Tuning (PEFT) methods for adapting billion-parameter vision models to livestock behavior classification.
- 83.16% test accuracy using QLoRA with only 2.72% trainable parameters
- 65% reduction in training time compared to training from scratch
- 1:98 training-to-test ratio - 2,160 verified images generalize to 211,800+ real-world samples
- Underfitting, not overfitting, is the primary challenge when adapting foundation models to agricultural imagery
| Method | Target Modules | Rank | Trainable Params | Training Time | Test Accuracy |
|---|---|---|---|---|---|
| ResNet-18 (scratch) | β | β | 11.2M (100%) | 16h 45m | 72.87% |
| ViT-Small (scratch) | β | β | 21.7M (100%) | 18h 39m | 61.91% |
| DINOv3 (frozen) | β | β | 4.7M (0.07%) | 17h 27m | 76.56% |
| QLoRA | q_proj | 8 | 2.6M (0.04%) | 6h 32m | 77.17% |
| QLoRA | q_proj | 16 | 5.2M (0.08%) | 7h 16m | 78.38% |
| QLoRA | all-linear | 16 | 46.8M (0.70%) | 4h 43m | 80.40% |
| QLoRA | all-linear | 64 | 183.0M (2.72%) | 5h 46m | 83.16% |
| DoRA | q_proj | 8 | 2.8M (0.04%) | 11h 31m | 81.53% |
| DoRA | q_proj | 16 | 5.4M (0.08%) | 10h 27m | 81.03% |
| DoRA | all-linear | 16 | 48.4M (0.72%) | 11h 51m | 81.23% |
| DoRA | all-linear | 64 | 184.5M (2.75%) | 10h 59m | 83.14% |
PEFT-Fine-tuning-cows/
βββ assets/
β βββ infographics.png
βββ notebooks/
β βββ DoRA Fewer Layers R=8.ipynb
β βββ DoRA Fewer Layers R=16.ipynb
β βββ DoRA More Layers R=16.ipynb
β βββ DoRA More Layers R=64.ipynb
β βββ Q-lora Fewer Layers R = 8.ipynb
β βββ Q-lora Fewer Layers R = 16.ipynb
β βββ Q-lora More Layers Rank = 16.ipynb
β βββ Q-lora More Layers Rank = 64.ipynb
β βββ TrainFromScatch_Preprocesing Pipeline Val = 1.ipynb
β βββ UsingPretrainedModel_DinoV3 Embeddings Val = 1.ipynb
βββ environment.yml
βββ requirements.txt
βββ LICENSE
βββ README.md
| Notebook | Description |
|---|---|
| TrainFromScatch_Preprocesing Pipeline Val = 1 | Training ResNet-18/Vit-Small from scratch with data preprocessing pipeline |
| UsingPretrainedModel_DinoV3 Embeddings Val = 1 | Frozen DINOv3 feature extraction with classification head |
| Notebook | Target Modules | Rank | Test Accuracy |
|---|---|---|---|
| Q-lora Fewer Layers R = 8 | q_proj | 8 | 77.17% |
| Q-lora Fewer Layers R = 16 | q_proj | 16 | 78.38% |
| Q-lora More Layers Rank = 16 | all-linear | 16 | 80.40% |
| Q-lora More Layers Rank = 64 | all-linear | 64 | 83.16% |
| Notebook | Target Modules | Rank | Test Accuracy |
|---|---|---|---|
| DoRA Fewer Layers R=8 | q_proj | 8 | 81.53% |
| DoRA Fewer Layers R=16 | q_proj | 16 | 81.03% |
| DoRA More Layers R=16 | all-linear | 16 | 81.23% |
| DoRA More Layers R=64 | all-linear | 64 | 83.14% |
Model checkpoints are available on Hugging Face:
| Model | Description | Test Accuracy | Link |
|---|---|---|---|
| DINOv3 + QLoRA (r=64) | Best performing model | 83.16% | π€ cow-behavior-dinov3-qlora-r64 |
| DINOv3 + DoRA (r=64) | Best DoRA configuration | 83.14% | π€ cow-behavior-dinov3-dora-r64 |
| DINOv3 Frozen | Frozen feature extraction | 76.56% | π€ cow-behavior-dinov3-Frozen |
| ResNet-18 (scratch) | Baseline trained from scratch | 72.87% | π€ cow-behavior-from-scratch |
from transformers import AutoModel
from peft import PeftModel
# Load QLoRA model (best performing)
base_model = AutoModel.from_pretrained("facebook/dinov3-vit7b16-pretrain-lvd1689m")
model = PeftModel.from_pretrained(base_model, "Sonam5/cow-behavior-dinov3-qlora-r64")Our dataset consists of nine dairy cow behaviors:
| Behavior | Training Samples | Test Samples |
|---|---|---|
| Drinking | 240 | 3,011 |
| Feeding head down | 240 | 30,952 |
| Feeding head up | 240 | 18,783 |
| Lying | 240 | 83,509 |
| Standing | 240 | 69,807 |
| Walking | 240 | 3,819 |
| Frontal pushing | 240 | 600 |
| Gallop | 240 | 575 |
| Leap | 240 | 744 |
Total: 2,160 training images β 211,800 test samples (1:98 ratio)
- Python 3.12
- CUDA compatible GPU with 16GB+ VRAM (Tesla V100 or equivalent)
# Clone the repository
git clone https://github.com/YOUR_USERNAME/PEFT-Fine-tuning-cows.git
cd PEFT-Fine-tuning-cows
# Create conda environment
conda env create -f environment.yml
conda activate cow-behavior-analysis# Clone the repository
git clone https://github.com/YOUR_USERNAME/PEFT-Fine-tuning-cows.git
cd PEFT-Fine-tuning-cows
# Install dependencies
pip install -r requirements.txtIf you find this work useful, please cite our paper:
@article{yang2025peft_cattle,
title={When Billion-Parameter Foundation Models Meet Limited Data: Parameter-Efficient Fine-Tuning with QLoRA and DoRA for Generalizable Image Classification},
author={Yang, Haiyu and Sharma, Sumit and Liu, Enhong and Hostens, Miel},
}Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- DINOv3 - Foundation model
- PEFT - Parameter-efficient fine-tuning library
- bitsandbytes - Quantization library
**Haiyu Yang ** - Cornell University
