Skip to content

LingjieKong-fdu/CustAny

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 

Repository files navigation

CustAny: Customizing Anything from A Single Example (CVPR2025 Oral)

[Paper]   [Project Page]  

Core Properties:

  1. General field: Applicable to general objects, including but not limited to: human face, clothes, tools, animal, ...
  2. Fast generation: Achieve object customization within seconds, without any further fine-tuning.
  3. User friendly: Just one single reference image and a text prompt can meet the customization requirement.
  4. Outstanding results: Ensure high ID fidelity, flexible text editability and high quality.
  5. Various applications: Diverse applications such as general object customization, virtual try-on, ID-mixing, ...

General ID Dataset:

To promot the research of the general object customization, we construct the first large-scale general ID dataset, named as Multi-Category ID-Consistent (MC-IDC) dataset. Our dataset consists of approximately 315,000 samples in total with more than 10,000 categories, covering various types such as human faces, animals, clothes, human-made tools, etc. Each sample consists of a reference image, a segmentation mask of the object of interest in the reference image, a target image, and a text caption of the target image. The reference image with its segmentation mask provides ID information, the text caption of the target image offers semantic-level guidance for generation, and the target image serves as the ground truth.


Method Framework:

CustAny consists of three crucial ID processing modules: General ID Extraction Module, Dual-Level ID Injection Module, ID-Aware Decoupling Module.


Comparisons with Previous Works:

The CustAny exhibits outstanding capabilities of high-quality customization for general objects, and even beat task-specialized methods in the specific domains, such as human customization and virtual try-on, in terms of ID fidelity and text editability.


Download:

  • You can download the dataset through the following link. The dataset contains 350,000 samples.
  • Due to copyright and licensing restrictions, we have made partial modifications to the dataset originally presented in our paper. This makes the currently publicly available dataset slightly different from the one in the paper. These adjustments ensure compliance with intellectual property guidelines while preserving the core structure and utility of the dataset for research purposes.
  • To maintain robust training performance and generalize across diverse scenarios, we have expanded the dataset by incorporating new samples. These additions follow the same rigorous construction pipeline as the original dataset, ensuring consistency in quality and methodology.
  • We are committed to supporting ongoing research by continuously enriching the dataset. Plans are underway to integrate additional samples in the future, which will further enhance its scope and utility. Stay tuned for updates as we strive to facilitate groundbreaking advancements in the field.
Data Link: https://pan.baidu.com/s/1IM6dDhyF2iF2Hk41aidC0g
Extraction Code: 3e12

License and Agreement:

  • The dataset is made available under Creative Commons BY-NC 4.0. You can use, redistribute, and adapt the material for non-commercial purposes, as long as you give appropriate credit by citing our paper and indicate any changes that you've made.
  • The dataset is available for non-commercial research purposes only.
  • You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any part of the samples and any part of derived data.
  • All the original images in the samples of this dataset are sourced from the Internet or public datasets, which are not property of our institutions. The text captions of the images in the dataset are generated by an open-source large-scale vision-language model. Our institutions are not responsible for the content nor the meaning of these images or text captions.

BibTeX

If you find CustAny useful for your research and applications, please cite using this BibTeX:

@article{kong2024anymaker,
  title={AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection},
  author={Kong, Lingjie and Wu, Kai and Hu, Xiaobin and Han, Wenhui and Peng, Jinlong and Xu, Chengming and Luo, Donghao and Zhang, Jiangning and Wang, Chengjie and Fu, Yanwei},
  journal={arXiv preprint arXiv:2406.11643},
  year={2024}
}

About

Official code for CustAny: Customizing Anything from A Single Example. Accepted by CVPR2025 (Oral)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors