This repository contains the official code for the research project "Masked Visual-Tactile Pre-training for Robot Manipulation," presented at ICRA24. The project focuses on enhancing robotic manipulation capabilities through a novel approach that integrates visual and tactile information.
- Project Webpage: https://lqts.github.io/M2VTP/
- Research Paper: IEEE | ResearchGate
- Demo Video: Bilibili
This repository provides the following functionalities:
- Environment Setup: Instructions for configuring the necessary environment.
- Pre-trained Model Integration: Code for importing and utilizing pre-trained models.
- Downstream Task Training: Scripts for training models on specific manipulation tasks.
- Model Evaluation: Tools for testing trained models and visualizing training strategies.
To set up the environment, please refer to the detailed instructions in the Dependencies section.
You can access the pre-trained model code here. Additionally, pre-trained models can be downloaded from this link. Place the downloaded model and configuration files in the model/vitac/model_and_config directory. You can modify the directory information in model/backbones/pre_model.py as shown below:
MODEL_REGISTRY = {
"vt20t-reall-tmr05-bin-ft+dataset-BottleCap": {
"config": "model/vitac/model_and_config/vt20t-reall-tmr05-bin-ft+dataset-BottleCap.json",
"checkpoint": "model/vitac/model_and_config/vt20t-reall-tmr05-bin-ft+dataset-BottleCap.pt",
"cls": VTT_ReAll,
}
}The repository includes a downstream task for bottle cap manipulation. To train the ShadowHand policy for this task, execute the following command:
python train_agent.py --task bottle_cap_vt --seed 123Note: The training process requires at least two Nvidia 3090 GPUs—one for model training and the other for image rendering. Set the following environment variables accordingly:
import os
os.environ['MUJOCO_GL'] = 'egl'
os.environ["MUJOCO_EGL_DEVICE_ID"] = "1" # '1' is for image rendering, '0' is used for training.We provide a pre-trained policy that can be downloaded from this link. You can also train your own model using the instructions above.
To perform testing, use the following command:
python eval_agent.py --task bottle_cap_vt --seed 123 --resume_model path/to/your/model.pt --testThe test results will be output to the console and saved to a specified file.
To save the operation process as a video, run the command:
python eval_agent.py --task bottle_cap_vt --seed 123 --resume_model path/to/your/model.pt --test --env_visThe video will be saved in the runs/videos directory.
This project is licensed under the MIT License - see the LICENSE file for details.
If you have any questions or need support, please contact Qingtao Liu or Qi Ye. .
@inproceedings{liu2024m2vtp,
title={Masked Visual-Tactile Pre-training for Robot Manipulation},
author={Liu, Qingtao and Ye, Qi and Sun, Zhengnan and Cui, Yu and Li, Gaofeng and Chen, Jiming},
booktitle={2024 IEEE International Conference on Robotics and Automation (ICRA)},
year={2024},
organization={IEEE}
}