- General field: Applicable to general objects, including but not limited to: human face, clothes, tools, animal, ...
- Fast generation: Achieve object customization within seconds, without any further fine-tuning.
- User friendly: Just one single reference image and a text prompt can meet the customization requirement.
- Outstanding results: Ensure high ID fidelity, flexible text editability and high quality.
- Various applications: Diverse applications such as general object customization, virtual try-on, ID-mixing, ...
To promot the research of the general object customization, we construct the first large-scale general ID dataset, named as Multi-Category ID-Consistent (MC-IDC) dataset. Our dataset consists of approximately 315,000 samples in total with more than 10,000 categories, covering various types such as human faces, animals, clothes, human-made tools, etc. Each sample consists of a reference image, a segmentation mask of the object of interest in the reference image, a target image, and a text caption of the target image. The reference image with its segmentation mask provides ID information, the text caption of the target image offers semantic-level guidance for generation, and the target image serves as the ground truth.
CustAny consists of three crucial ID processing modules: General ID Extraction Module, Dual-Level ID Injection Module, ID-Aware Decoupling Module.
The CustAny exhibits outstanding capabilities of high-quality customization for general objects, and even beat task-specialized methods in the specific domains, such as human customization and virtual try-on, in terms of ID fidelity and text editability.
- You can download the dataset through the following link. The dataset contains 350,000 samples.
- Due to copyright and licensing restrictions, we have made partial modifications to the dataset originally presented in our paper. This makes the currently publicly available dataset slightly different from the one in the paper. These adjustments ensure compliance with intellectual property guidelines while preserving the core structure and utility of the dataset for research purposes.
- To maintain robust training performance and generalize across diverse scenarios, we have expanded the dataset by incorporating new samples. These additions follow the same rigorous construction pipeline as the original dataset, ensuring consistency in quality and methodology.
- We are committed to supporting ongoing research by continuously enriching the dataset. Plans are underway to integrate additional samples in the future, which will further enhance its scope and utility. Stay tuned for updates as we strive to facilitate groundbreaking advancements in the field.
Data Link: https://pan.baidu.com/s/1IM6dDhyF2iF2Hk41aidC0g
Extraction Code: 3e12- The dataset is made available under Creative Commons BY-NC 4.0. You can use, redistribute, and adapt the material for non-commercial purposes, as long as you give appropriate credit by citing our paper and indicate any changes that you've made.
- The dataset is available for non-commercial research purposes only.
- You agree not to reproduce, duplicate, copy, sell, trade, resell or exploit for any commercial purposes, any part of the samples and any part of derived data.
- All the original images in the samples of this dataset are sourced from the Internet or public datasets, which are not property of our institutions. The text captions of the images in the dataset are generated by an open-source large-scale vision-language model. Our institutions are not responsible for the content nor the meaning of these images or text captions.
If you find CustAny useful for your research and applications, please cite using this BibTeX:
@article{kong2024anymaker,
title={AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection},
author={Kong, Lingjie and Wu, Kai and Hu, Xiaobin and Han, Wenhui and Peng, Jinlong and Xu, Chengming and Luo, Donghao and Zhang, Jiangning and Wang, Chengjie and Fu, Yanwei},
journal={arXiv preprint arXiv:2406.11643},
year={2024}
}
.png?raw=true)


